This question already has answers here:
Are composite unique keys indexed in MySQL?
(2 answers)
Closed 1 year ago.
I create the following table:
CREATE TABLE ta
(
id BIGINT NOT NULL auto_increment,
company_id BIGINT NOT NULL,
language VARCHAR(10) NOT NULL,
created_at DATETIME,
modified_at DATETIME,
version BIGINT,
PRIMARY KEY (id),
UNIQUE KEY unique_ta (company_id, language)
) engine = InnoDB;
CREATE INDEX ta_company_id on `ta` (company_id);
My question is if I need this line:
CREATE INDEX ta_company_id on `ta` (company_id);
?
Does UNIQUE create indexes on company_id, language automatically?
You probably don't need the extra index on company_id.
The UNIQUE KEY creates an index on the pair of columns (company_id, language) in that order. So any query you would run searching for a specific value of company_id would be able to use that index, even though it only references the first column of the unique key index.
You can see this in EXPLAIN:
mysql> EXPLAIN SELECT * FROM ta WHERE company_id = 1234;
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-------+
| 1 | SIMPLE | ta | NULL | ref | unique_ta | unique_ta | 8 | const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-------+
You can see key_len: 8 meaning it is using 8 bytes of the index, and the first BIGINT for company_id is 8 bytes.
Whereas searching for both columns will use the full 50-byte size of the index (8 bytes for the BIGINT + 10 characters for the VARCHAR, 4 bytes per character using utf8mb4, plus a couple of bytes for the VARCHAR length):
mysql> EXPLAIN SELECT * FROM ta WHERE company_id = 1234 AND language = 'EN';
+----+-------------+-------+------------+-------+---------------+-----------+---------+-------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-----------+---------+-------------+------+----------+-------+
| 1 | SIMPLE | ta | NULL | const | unique_ta | unique_ta | 50 | const,const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+-------+---------------+-----------+---------+-------------+------+----------+-------+
I said at the top "probably" because there is an exception case, for a specific form of query:
SELECT * FROM ta WHERE company_id = 1234 ORDER BY id;
This type of query would need id to be the second column of the index, so it could be assured of reading rows in primary key order. All indexes implicitly have the primary key column appended, even if you don't declare it. So your unique key index would really store the columns (company_id, language, id), and the single-column index really stores the columns (company_id, id). The latter index would optimize the query I show above, sorting by primary key efficiently.
Related
I need to create composit clustered index like: username, name, id. Is it real to implement such thing? I need to boost perfomance of query like where username = ? and name = ? by using clustered indexes in Innodb. But i think it wont work because id stay at 3rd place, and it wont be used.
It's fine to define a clustered index with multiple columns.
CREATE TABLE mytable (
username VARCHAR(64) NOT NULL,
name VARCHAR(64) NOT NULL,
id BIGINT
PRIMARY KEY (username, name, id)
);
If you query against the first two columns, it will use the clustered index, so it will avoid the overhead of lookups via secondary indexes.
But if you use EXPLAIN to report the optimizer's plan for the query, you'll see that the access is type: ref which means an index lookup, but not a unique index lookup. That is, it will potentially match multiple rows.
mysql> explain select * from mytable where username = 'user' and name = 'name';
+----+-------------+---------+------------+------+---------------+---------+---------+-------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+---------+---------+-------------+------+----------+-------+
| 1 | SIMPLE | mytable | NULL | ref | PRIMARY | PRIMARY | 516 | const,const | 1 | 100.00 | NULL |
+----+-------------+---------+------------+------+---------------+---------+---------+-------------+------+----------+-------+
When doing lookups against a PRIMARY KEY, we'd like to see type: eq_ref or type: const which means it is doing a unique lookup, and the query is guaranteed to match either 0 or 1 row.
mysql> explain select * from mytable where username = 'user' and name = 'name' and id = 1;
+----+-------------+---------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------+
| 1 | SIMPLE | mytable | NULL | const | PRIMARY | PRIMARY | 524 | const,const,const | 1 | 100.00 | NULL |
+----+-------------+---------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------+
Both queries are using the clustered index.
Re your comment:
InnoDB requires the auto-increment column be the first column of a key in the table. It doesn't have to be the primary key. So you can do this for example:
CREATE TABLE `mytable` (
`username` varchar(64) NOT NULL,
`name` varchar(64) NOT NULL,
`id` bigint NOT NULL AUTO_INCREMENT,
`x` int DEFAULT NULL,
PRIMARY KEY (`username`,`name`,`id`),
KEY (`id`)
) ENGINE=InnoDB;
Notice I added an extra KEY (id) to satisfy InnoDB's requirement. But in the primary key, id is still at the end.
I've read a lot of questions about query optimization but none have helped me with this.
As setup, I have 3 tables that represent an "entry" that can have zero or more "categories".
> show create table entries;
CREATE TABLE `entries` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT
...
`name` varchar(255),
`updated_at` timestamp NOT NULL,
...
PRIMARY KEY (`id`),
KEY `name` (`name`)
) ENGINE=InnoDB
> show create table entry_categories;
CREATE TABLE `entry_categories` (
`ent_name` varchar(255),
`cat_id` int(11),
PRIMARY KEY (`ent_name`,`cat_id`),
KEY `names` (`ent_name`)
) ENGINE=InnoDB
(The actual "category" table doesn't come into the question.)
Editing an "entry" in the application creates a new row in the entry table -- think like the history of a wiki page -- with the same name and a newer timestamp. I want to see how many uniquely-named Entries don't have a category, which seems really straightforward:
SELECT COUNT(id)
FROM entries e
LEFT JOIN entry_categories c
ON e.name=c.ent_name
WHERE c.ent_name IS NUL
GROUP BY e.name;
On my small dataset (about 6000 total entries, with about 4000 names, averaging about one category per named entry) this query takes over 24 seconds (!). I've also tried
SELECT COUNT(id)
FROM entries e
WHERE NOT EXISTS(
SELECT ent_name
FROM entry_categories c
WHERE c.ent_name = e.name
)
GROUP BY e.name;
with similar results. This seems really, really slow to me, especially considering that finding entries in a single category with
SELECT COUNT(*)
FROM entries e
JOIN (
SELECT ent_name as name
FROM entry_categories
WHERE cat_id = 123
)c
USING (name)
GROUP BY name;
runs in about 120ms on the same data. Is there a better way to find records in a table that don't have at least one corresponding entry in another table?
I'll try to transcribe the EXPLAIN results for each query:
> EXPLAIN {no category query};
+----+-------------+-------+-------+---------------+-------+---------+------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------+---------+------+------+----------------------------------------------+
| 1 | SIMPLE | e | index | NULL | name | 767 | NULL | 6222 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | c | index | PRIMARY,names | names | 767 | NULL | 6906 | Using where; using index; Not exists |
+----+-------------+-------+-------+---------------+-------+---------+------+------+----------------------------------------------+
> EXPLAIN {single category query}
+----+-------------+------------+-------+---------------+-------+---------+------+--------------------------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+-------+---------+------+--------------------------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 2850 | Using temporary; Using filesort |
| 1 | PRIMARY | e | ref | name | 767 | c.name | 1 | Using where; Using index | |
| 2 | DERIVED | c | index | NULL | names | NULL | 6906 | Using where; Using index | |
+----+-------------+------------+-------+---------------+-------+---------+------+--------------------------+---------------------------------+
Try:
select name, sum(e) count_entries from
(select name, 1 e, 0 c from entries
union all
select ent_name name, 0 e, 1 c from entry_categories) s
group by name
having sum(c) = 0
First: remove the names key as it's the same as the primary key (as the ent_name column is the left-most in the primary key and the PK can be used to resolve the query). This should change the output of explain by using the PK in the join.
The keys you are using to join are pretty large (255 varchar column) - it is better if you can use integers for this, even if this mean introducing one more table (with the room_id, room_name mapping)
For some reason the query uses filesort, despite that you don't have an order by clause.
Can you show the explain results next to each query, and the single category query, for further diagnosis?
I have these small tables, item and category:
CREATE TABLE `item` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(150) NOT NULL,
`category_id` mediumint(8) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`),
KEY `category_id` (`category_id`)
) CHARSET=utf8
CREATE TABLE `category` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(150) NOT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`)
) CHARSET=utf8
I have inserted 100 categories and 1000 items.
If I run this:
EXPLAIN SELECT item.id,category.name AS category_name FROM item JOIN category ON item.category_id=category.id;
Then, if the tables' engine is InnoDB I get:
+----+-------------+----------+-------+---------------+-------------+---------+--------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+-------------+---------+--------------------+------+-------------+
| 1 | SIMPLE | category | index | PRIMARY | name | 452 | NULL | 103 | Using index |
| 1 | SIMPLE | item | ref | category_id | category_id | 3 | dbname.category.id | 5 | Using index |
+----+-------------+----------+-------+---------------+-------------+---------+--------------------+------+-------------+
Whereas, if I switch to MyISAM (with alter table engine=myisam) I get:
+----+-------------+----------+--------+---------------+---------+---------+-------------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+--------+---------------+---------+---------+-------------------------+------+-------+
| 1 | SIMPLE | item | ALL | category_id | NULL | NULL | NULL | 1003 | |
| 1 | SIMPLE | category | eq_ref | PRIMARY | PRIMARY | 3 | dbname.item.category_id | 1 | |
+----+-------------+----------+--------+---------------+---------+---------+-------------------------+------+-------+
My question is, why this difference in the way indexes are handled?
In InnoDB, any secondary index internally contains the primary key column of the table. So the index name on column (name) is implicitly on columns (name, id).
This means that EXPLAIN shows your access to the category table as an "index-scan" (this is shown in the type column as "index"). By scanning the index, it also has access to the id column, which it uses to look up rows in the second table, item.
Then it also takes advantage of the item index on (category_id) which is really (category_id, id), and it is able to fetch item.id for your select-list simply by reading the index. No need to read the table at all (this is shown in the Extra column as "Using index").
MyISAM doesn't store primary keys with the secondary key in this way, so it can't get the same optimizations. The access to the category table is type "ALL" which means a table-scan.
I would expect the access to the MyISAM table item would be "ref" as it looks up rows using the index on (category_id). But the optimizer may get skewed results if you have very few rows in the table, or if you haven't done ANALYZE TABLE item since creating the index.
Re your update:
It looks like the optimizer prefers an index-scan over a table-scan, so it takes the opportunity to do an index-scan in InnoDB, and puts the category table first. The optimizer decides to re-order the tables instead of using the tables in the order you gave them in your query.
In the MyISAM tables, there will be one table-scan whichever table it chooses to access first, but by putting the category table second, it joins to category's PRIMARY key index instead of item's secondary index. The optimizer prefers lookups to a unique or primary key (type "eq_ref").
iam fighting with some performance problems on a very simple table which seems to be very slow when fetching data by using its primary key (bigint)
I have this table with 124 million entries:
CREATE TABLE `nodes` (
`id` bigint(20) NOT NULL,
`lat` float(13,7) NOT NULL,
`lon` float(13,7) NOT NULL,
PRIMARY KEY (`id`),
KEY `lat_index` (`lat`),
KEY `lon_index` (`lon`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
and a simple query which takes some id from another table using the IN clause to fetch data from the nodes tables, but it takes like 1 hour only to fetch a few rows from this table.
EXPLAIN shows me its not using the PRIMARY key as index, its simply scanning the whole table. Why that? id and the id column from the other table are both from type bigint(20).
mysql> EXPLAIN SELECT lat, lon FROM nodes WHERE id IN (SELECT node_id FROM ways_elements WHERE way_id = '4962890');
+----+--------------------+-------------------+------+---------------+--------+---------+-------+-----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------------------+------+---------------+--------+---------+-------+-----------+-------------+
| 1 | PRIMARY | nodes | ALL | NULL | NULL | NULL | NULL | 124035228 | Using where |
| 2 | DEPENDENT SUBQUERY | ways_elements | ref | way_id | way_id | 8 | const | 2 | Using where |
+----+--------------------+-------------------+------+---------------+--------+---------+-------+-----------+-------------+
The query SELECT node_id FROM ways_elements WHERE way_id = '4962890' simply returns two node ids, so the whole query should only return two rows, but it takes more or less 1 hour.
Using "force index (PRIMARY)" didnt help, even if it would help, why does MySQL not take that index since its a primary key? EXPLAIN doesnt even mention anything in the possible_keys columns but select_type shows PRIMARY.
Am i doing something wrong?
How does this perform?
SELECT lat, lon FROM nodes t1 join ways_elements t2 on (t1.id=t2.node_id) WHERE t2.way_id = '4962890'
I suspect that your query is checking each row in nodes against each item in the "IN" clause.
This is what is called a correlated subquery. You can see this as reference or this popular question posted on Stackoverflow. A better query to use is:
SELECT lat,
lon
FROM nodes n
JOIN ways_elements w ON n.id = w.node_id
WHERE way_id = '4962890'
I have the following MySQL table (table size - around 10K records):
CREATE TABLE `tmp_index_test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`m_id` int(11) DEFAULT NULL,
`r_id` int(11) DEFAULT NULL,
`price` float DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `m_key` (`m_id`),
KEY `r_key` (`r_id`),
KEY `price_key` (`price`)
) ENGINE=InnoDB AUTO_INCREMENT=16390 DEFAULT CHARSET=utf8;
As you can see, I have two INTEGER fields (r_id and m_id) and one FLOAT field (price).
For each of these fields I have an index.
Now, when I run a query with condition on the first integer AND on the second one, everything is fine:
mysql> explain select * from tmp_index_test where m_id=1 and r_id=2;
+----+-------------+----------------+-------------+---------------+-------------+---------+------+------+-------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+-------------+---------------+-------------+---------+------+------+-------------------------------------------+
| 1 | SIMPLE | tmp_index_test | index_merge | m_key,r_key | r_key,m_key | 5,5 | NULL | 1 | Using intersect(r_key,m_key); Using where |
+----+-------------+----------------+-------------+---------------+-------------+---------+------+------+-------------------------------------------+
Seems like MySQL performs it very well since there is the Using intersect(r_key,m_key) in the Extra field.
I'm not a MySQL expert, but according to what I understand, MySQL is first making the intersection on indexes, and only then collects the result of the intersection from the table itself.
HOWEVER, when I run very similar query, but instead of condition on two integers, I put similar condition on an integer and a float, MySQL refuses to intersect the result on indexes:
mysql> explain select * from tmp_index_test where m_id=3 and price=100;
+----+-------------+----------------+------+-----------------+-----------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+------+-----------------+-----------+---------+-------+------+-------------+
| 1 | SIMPLE | tmp_index_test | ref | m_key,price_key | price_key | 5 | const | 1 | Using where |
+----+-------------+----------------+------+-----------------+-----------+---------+-------+------+-------------+
As you can see, MySQL decides to use the index of price only.
My first question is why, and how to fix it?
In addition to it, I need to run queries with MORE sign (>) instead of the equal sign (=) on price. Currently explain shows that for such queries, MySQL uses the integer key only.
mysql> explain select * from tmp_index_test where m_id=3 and price > 100;
+----+-------------+----------------+------+-----------------+-------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+------+-----------------+-------+---------+-------+------+-------------+
| 1 | SIMPLE | tmp_index_test | ref | m_key,price_key | m_key | 5 | const | 2 | Using where |
+----+-------------+----------------+------+-----------------+-------+---------+-------+------+-------------+
I need to make somehow MySQL first do the intersection on indexes. Anybody has any idea how?
Thanks a lot in advance!
From the MySQL manual:
ref is used if the join uses only a leftmost prefix of the key or if
the key is not a PRIMARY KEY or UNIQUE index (in other words, if the
join cannot select a single row based on the key value). If the key
that is used matches only a few rows, this is a good join type.
price is not unique or primary, so ref is chosen. I don't believe you can force an intersect.