Why would MySQL use index intersection instead of combined index? - mysql

From time to time I encounter a strange MySQL behavior. Let's assume I have indexes (type, rel, created), (type), (rel). The best choice for a query like this one:
SELECT id FROM tbl
WHERE rel = 3 AND type = 3
ORDER BY created;
would be to use index (type, rel, created).
But MySQL decides to intersect indexes (type) and (rel), and that leads to worse perfomance. Here is an example:
mysql> EXPLAIN
-> SELECT id FROM tbl
-> WHERE rel = 3 AND type = 3
-> ORDER BY created\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tbl
type: index_merge
possible_keys: idx_type,idx_rel,idx_rel_type_created
key: idx_type,idx_rel
key_len: 1,2
ref: NULL
rows: 4343
Extra: Using intersect(idx_type,idx_rel); Using where; Using filesort
And the same query, but with a hint added:
mysql> EXPLAIN
-> SELECT id FROM tbl USE INDEX (idx_type_rel_created)
-> WHERE rel = 3 AND type = 3
-> ORDER BY created\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tbl
type: ref
possible_keys: idx_type_rel_created
key: idx_type_rel_created
key_len: 3
ref: const,const
rows: 8906
Extra: Using where
I think MySQL takes an execution plan which contains less number in the "rows" column of the EXPLAIN command. From that point of view, index intersection with 4343 rows looks really better than using my combined index with 8906 rows. So, maybe the problem is within those numbers?
mysql> SELECT COUNT(*) FROM tbl WHERE type=3 AND rel=3;
+----------+
| COUNT(*) |
+----------+
| 3056 |
+----------+
From this I can conclude that MySQL is mistaken at calculating approximate number of rows for combined index.
So, what can I do here to make MySQL take the right execution plan?
I can not use optimizer hints, because I have to stick to Django ORM
The only solution I found yet is to remove those one-field indexes.
MySQL version is 5.1.49.
The table structure is:
CREATE TABLE tbl (
`id` int(11) NOT NULL AUTO_INCREMENT,
`type` tinyint(1) NOT NULL,
`rel` smallint(2) NOT NULL,
`created` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_type` (`type`),
KEY `idx_rel` (`rel`),
KEY `idx_type_rel_created` (`type`,`rel`,`created`)
) ENGINE=MyISAM;

It's hard to tell exactly why MySQL chooses index_merge_intersection over the index scan, but you should note that with the composite indexes, statistics up to the given column are stored for the composite indexes.
The value of information_schema.statistics.cardinality for the column type of the composite index will show the cardinality of (rel, type), not type itself.
If there is a correlation between rel and type, then cardinality of (rel, type) will be less than product of cardinalities of rel and type taken separately from the indexes on corresponding columns.
That's why the number of rows is calculated incorrectly (an intersection cannot be larger in size than a union).
You can forbid index_merge_intersection by setting it to off in ##optimizer_switch:
SET optimizer_switch = 'index_merge_intersection=off'

Another thing is worth mentioning: you would not have the problem if you deleted the index on type only. the index is not required since it duplicates a part of the composite index.

Some time the intersection on same table could be interesting, and you may not want to remove an index on a single colum so as some other query work well with intersection.
In such case, if the bad execution plan concerns only one single query, a solution is to exclude the unwanted index. Il will then prevent the usage of intersection only for that sepcific query...
In your example :
SELECT id FROM tbl IGNORE INDEX(idx_type)
WHERE rel = 3 AND type = 3
ORDER BY created;
enter code here

Related

Is there a better way of doing this in mysql? - update entire column with another select and group by

I have a table sample with two columns id and cnt and another table PostTags with two columns postid and tagid
I want to update all cnt values with their corresponding counts and I have written the following query:
UPDATE sample SET
cnt = (SELECT COUNT(tagid)
FROM PostTags
WHERE sample.postid = PostTags.postid
GROUP BY PostTags.postid)
I intend to update entire column at once and I seem to accomplish this. But performance-wise, is this the best way? Or is there a better way?
EDIT
I've been running this query (without GROUP BY) for over 1 hour for ~18m records. I'm looking for a query that is better in performance.
That query should not take an hour. I just did a test, running a query like yours on a table of 87520 keywords and matching rows in a many-to-many table of 2776445 movie_keyword rows. In my test, it took 32 seconds.
The crucial part that you're probably missing is that you must have an index on the lookup column, which is PostTags.postid in your example.
Here's the EXPLAIN from my test (finally we can do EXPLAIN on UPDATE statements in MySQL 5.6):
mysql> explain update kc1 set count =
(select count(*) from movie_keyword
where kc1.keyword_id = movie_keyword.keyword_id) \G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: kc1
type: index
possible_keys: NULL
key: PRIMARY
key_len: 4
ref: NULL
rows: 98867
Extra: Using temporary
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: movie_keyword
type: ref
possible_keys: k_m
key: k_m
key_len: 4
ref: imdb.kc1.keyword_id
rows: 17
Extra: Using index
Having an index on keyword_id is important. In my case, I had a compound index, but a single-column index would help too.
CREATE TABLE `movie_keyword` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`movie_id` int(11) NOT NULL,
`keyword_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `k_m` (`keyword_id`,`movie_id`)
);
The difference between COUNT(*) and COUNT(movie_id) should be immaterial, assuming movie_id is NOT NULLable. But I use COUNT(*) because it'll still count as an index-only query if my index is defined only on the keyword_id column.
Remove the unnecessary GROUP BY and the statement looks good. If however you expect many sample.set already to contain the correct value, then you would update many records that need no update. This may create some overhead (larger rollback segments, triggers executed etc.) and thus take longer.
In order to only update the records that need be updated, join:
UPDATE sample
INNER JOIN
(
SELECT postid, COUNT(tagid) as cnt
FROM PostTags
GROUP BY postid
) tags ON tags.postid = sample.postid
SET sample.cnt = tags.cnt
WHERE sample.cnt != tags.cnt OR sample.cnt IS NULL;
Here is the SQL fiddle: http://sqlfiddle.com/#!2/d5e88.

Do I need to add an Index on ORDER BY field?

I have a query of such like
$query = "SELECT * FROM tbl_comments WHERE id=222 ORDER BY comment_time";
Do I need to add an index on the comment_time field?
Also, if I want to get the data between two dates then how should I build the index?
Yes, index will help you, when using ORDER BY. Because INDEX is a sorted data structure, so the request will be executed faster.
Look at this example: table test2 with 3 rows. I used LIMIT after order by to show the difference in execution.
DROP TABLE IF EXISTS `test2`;
CREATE TABLE `test2` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`value` varchar(10) CHARACTER SET utf8 COLLATE utf8_swedish_ci NOT NULL,
PRIMARY KEY (`id`),
KEY `ix_value` (`value`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8;
-- ----------------------------
-- Records of test2
-- ----------------------------
INSERT INTO `test2` VALUES ('1', '10');
INSERT INTO `test2` VALUES ('2', '11');
INSERT INTO `test2` VALUES ('2', '9');
-- ----------------------------
-- Without INDEX
-- ----------------------------
mysql> EXPLAIN SELECT * FROM test2 ORDER BY value LIMIT 1\G
*************************** 1. row *************************
id: 1
select_type: SIMPLE
table: test2
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 3
Extra: Using filesort
1 row in set (0.00 sec)
MySQL checked 3 rows to output the result.
After CREATE INDEX, we get this:
mysql> CREATE INDEX ix_value ON test2 (value) USING BTREE;
Query OK, 0 rows affected (0.14 sec)
-- ----------------------------
-- With INDEX
-- ----------------------------
mysql> EXPLAIN SELECT * FROM test2 ORDER BY value LIMIT 1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: test2
type: index
possible_keys: NULL
key: ix_value
key_len: 32
ref: NULL
rows: 1
Extra: Using index
1 row in set (0.00 sec)
Now MySQL used only 1 row.
Answering the received comments, I tried the same query without LIMIT:
-- ----------------------------
-- Without INDEX
-- ----------------------------
mysql> EXPLAIN SELECT * FROM test2 ORDER BY value\G
*************************** 1. row ******************
id: 1
select_type: SIMPLE
table: test2
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 3
Extra: Using filesort
-- ----------------------------
-- With INDEX
-- ----------------------------
mysql> EXPLAIN SELECT * FROM test2 ORDER BY value\G
*************************** 1. row *****************
id: 1
select_type: SIMPLE
table: test2
type: index
possible_keys: NULL
key: ix_value
key_len: 32
ref: NULL
rows: 3
Extra: Using index
As we see, it uses index, for the 2-nd ORDER BY.
To build an index on your field, use this:
CREATE INDEX ix_comment_time ON tbl_comments (comment_time) USING BTREE;
http://dev.mysql.com/doc/refman/5.0/en/create-index.html
An index on the comment_time field might not help at all for a query like this:
SELECT *
FROM tbl_comments
WHERE id=222
ORDER BY comment_time;
The query needs to scan the the table to find the matching id values. It can do this by scanning the index, looking up the rows, and doing the test. If there is one row that matches and it has the highext comment_time, then this requires scanning the index and reading the table.
Without the index, it would scan the table, find the row, and very quickly sort the 1 row. The sequential scan of the table would typically be faster than an index scan followed by a page lookup (and would definitely be faster on a table larger than available memory).
On the other hand, an index on id, comment_time would be very helpful.
Technically you don't need indices on every field, as it will work too, however for performance reasons you might need one or more.
EDIT
This problem is known from the beginning of software design. Typically if you increase amount of memory used by the program, you will reduce its speed (assuming the program is well-written). Assigning an index to a field increases data used by the db, but makes searching faster. If you do not want to search anything by this field (you actually do in the question), it would not be necessary.
In modern era the indices are not so big comparing to disk data size and adding one or more should not be a bad idea.
Normally it is very difficult to surely tell "do I need index or not". Some help is provided by EXPLAIN statement (refer to the manual).
Regarding your first question, you don't have to create index on comment_time. If the number of records is very large you'll need indices to speed your retrieval. But for your operation you don't need indices.
For your second question using a WHERE Clause like this will help you.
WHERE(comment_time BETWEEN 'startDate' AND 'endDate');
You don't have to put the index on comment_time if your where id is distinct.
To increase the speed of retrieval of data you would need index. This will work with out index also. For your second question you can use WHERE and BETWEEN clause.
Refer: http://www.w3schools.com/sql/sql_between.asp
The EXPLAIN statement is very useful in situations like that. For your query, you would use it as follows:
EXPLAIN SELECT * FROM tbl_comments WHERE id=222 ORDER BY comment_time
This will output which indexes are being used to execute the query and allows you to perform experiments with different indexes to find the best configuration. In order to speed up sorting, you will want a BTREE index since it stores data in a sorted manner. To speed up finding items with a certain id, a HASH index is the better option since it provides quick lookups for equality predicates. Note that MySQL might not be able to use a combination of both indexes to execute your query and will instead use just one of them.
Further information: http://dev.mysql.com/doc/refman/5.7/en/using-explain.html
For range predicates, like dates in a range of dates, a BTREE index will perform better than a HASH index.
Further information: http://dev.mysql.com/doc/refman/5.7/en/create-index.html

mysql index can not use

I have a table ‘t_table1′ include 3 fields :
`field1` tinyint(1) unsigned default NULL,
`field2` tinyint(1) unsigned default NULL,
`field3` tinyint(1) unsigned NOT NULL default ’0′,
and a Index:
KEY `t_table1_index1` (`field1`,`field2`,`field3`),
When I run this SQL1:
EXPLAIN SELECT * FROM table1 AS c WHERE c.field1 = 1 AND c.field2 = 0 AND c.field3 = 0
Then is show:
Select type: Simple
tyle: All
possible key: t_table1_index1
key: NULL
key_len: NULL
rows: 1042
extra: Using where
I think it say that my index useless in this case.
But when I run this SQL2:
EXPLAIN SELECT * FROM table1 AS c WHERE c.field1 = 1 AND c.field2 = 1 AND c.field3 = 1
it shows:
Select type: Simple
tyle: ref
possible key: t_table1_index1
key: t_table1_index1
key_len: 5
ref: const, const, const
rows: 1
extra: Using where
This case it used my index.
So please explain for me:
why SQL1 can not use index ?
with SQL1, how can i edit index or rewrite SQL to performing more quickly ?
Thanks !
The query optimizer uses many data points to decide how to execute a query. One of those is the selectivity of the index. To use an index requires potentially more disk accesses per row returned than a table scan because the engine has to read the index and then fetch the actual row (unless the entire query can be satisfied from the index alone). As the index becomes less selective (i.e. more rows match the criteria) the efficiency of using that index goes down. At some point it becomes cheaper to do a full table scan.
In your second example the optimizer was able to ascertain that the values you provided would result in only one row being fetched, so the index lookup was the correct approach.
In the first example, the values were not very selective, with an estimate of 1042 rows being returned out of 1776. Using the index would result in searching the index, building a list of selected row references and then fetching each row. With so many rows being selected, to use the index would have resulted in more work than just scanning the entire table lineraly and filtering the rows.

Why doesn't use Indexes?

I have a simple query:
SELECT id
FROM logo
WHERE (`description` LIKE "%google%" AND `active` = "y")
ORDER BY id ASC
the table logo (MyISAM) is formed by about 30 fields with 2402024 rows in it.
the field description is a varchar(255) not null.
the field active is an ENUM('y','n') not null
the cardinality of those indexes are:
`active`: BTREE Card. 2
`description`: BTREE Card. 200168
EXPLAIN on the query returns this:
select_type: SIMPLE
table: logo
type: ALL
possible_keys: active
key: NULL
key_len: NULL
ref: NULL
rows: 2402024
Extras: Using where
I am wondering why the query is not using the description index and how can I optimize the table so this query will run smoothly without a full table scan
Already optimized the table and checked for errors..
Given that your LIKE condition has a % at the beginning and end of the expression being compared, an index is useless in this context.
An index is only useful if the comparison expression can be found in an "ordered" fashion, as in A comes before B, which comes before C. In other words, LIKE 'google%' may use an index, while LIKE '%google%' will not.

Why isn't MySQL using any of these possible keys?

I have the following query:
SELECT t.id
FROM account_transaction t
JOIN transaction_code tc ON t.transaction_code_id = tc.id
JOIN account a ON t.account_number = a.account_number
GROUP BY tc.id
When I do an EXPLAIN the first row shows, among other things, this:
table: t
type: ALL
possible_keys: account_id,transaction_code_id,account_transaction_transaction_code_id,account_transaction_account_number
key: NULL
rows: 465663
Why is key NULL?
Another issue you may be encountering is a data type mis-match. For example, if your column is a string data type (CHAR, for ex), and your query is not quoting a number, then MySQL won't use the index.
SELECT * FROM tbl WHERE col = 12345; # No index
SELECT * FROM tbl WHERE col = '12345'; # Index
Source: Just fought this same issue today, and learned the hard way on MySQL 5.1. :)
Edit: Additional information to verify this:
mysql> desc das_table \G
*************************** 1. row ***************************
Field: das_column
Type: varchar(32)
Null: NO
Key: PRI
Default:
Extra:
*************************** 2. row ***************************
[SNIP!]
mysql> explain select * from das_table where das_column = 189017 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: das_column
type: ALL
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 874282
Extra: Using where
1 row in set (0.00 sec)
mysql> explain select * from das_table where das_column = '189017' \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: das_column
type: const
possible_keys: PRIMARY
key: PRIMARY
key_len: 34
ref: const
rows: 1
Extra:
1 row in set (0.00 sec)
It might be because the statistics is broken, or because it knows that you always have a 1:1 ratio between the two tables.
You can force an index to be used in the query, and see if that would speed up things. If it does, try to run ANALYZE TABLE to make sure statistics are up to date.
By specifying USE INDEX (index_list), you can tell MySQL to use only one of the named indexes to find rows in the table. The alternative syntax IGNORE INDEX (index_list) can be used to tell MySQL to not use some particular index or indexes. These hints are useful if EXPLAIN shows that MySQL is using the wrong index from the list of possible indexes.
You can also use FORCE INDEX, which acts like USE INDEX (index_list) but with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the given indexes to find rows in the table.
Each hint requires the names of indexes, not the names of columns. The name of a PRIMARY KEY is PRIMARY. To see the index names for a table, use SHOW INDEX.
From http://dev.mysql.com/doc/refman/5.1/en/index-hints.html
Index for the group by (=implicit order by)
...
GROUP BY tc.id
The group by does an implicit sort on tc.id.
tc.id is not listed a a possible key.
but t.transaction_id is.
Change the code to
SELECT t.id
FROM account_transaction t
JOIN transaction_code tc ON t.transaction_code_id = tc.id
JOIN account a ON t.account_number = a.account_number
GROUP BY t.transaction_code_id
This will put the potential index transaction_code_id into view.
Indexes for the joins
If the joins (nearly) fully join the three tables, there's no need to use the index, so MySQL doesn't.
Other reasons for not using an index
If a large % of the rows under consideration (40% IIRC) are filled with the same value. MySQL does not use an index. (because not using the index is faster)