Why the mysql optimizer is not using the complete index? - mysql

I am trying to understand why MySQL is not using the complete index to answer the query. Let me explain.
I am using imdb database through MySQL version 5.1.73-1 (Debian). I created and index "itid_inf25_mid_ndx" on the table movie_info_idx with columns (info_type_id, info(25), movie_id). Columns info_type_id and movie_id are integer (NOT NULL) and info is TEXT type, so each index entry takes 4+27+4 = 35 bytes. The output for sentence:
EXPLAIN
SELECT movie_id
FROM movie_info_idx
WHERE info_type_id = 101
AND info > "9";
shows these values:
select_type = SIMPLE; table = movie_info_idx; type=range;
possible_keys = itid_inf25_mid_ndx; key = itid_inf25_mid_ndx; key_len
= 31; ref = NULL; rows = 841; Extra = "Using where"
The key_len column and no "using index" in column extra are informing that just the columns (info_type_id,info(25)) which sum up 4+27 = 31 bytes are using from the index. I wonder why the optimizer is not using the column movie_id from the index in order to access the movie_id in the SELECT clause? It seems that the optimizer will access the base table movie_info_idx to take the movie_id value I want to list. Why?.
Thank you in advance for your reply.

Once MySQL uses a query for a "range scan" (matching more than one value) it will generally no longer user the last column.
Reason for this is that multi-column indexes are a tree of trees. In order to scan the index on the last column (movie_id) it has to search an index tree for every matching value of the range column (info). This is generally inefficient and so MySQL won't do it.
To improve the situation, put the column expected to be the range scan last, so order it as (info_type_id, movie_id, info)
More info:
https://dev.mysql.com/doc/refman/5.6/en/multiple-column-indexes.html

Related

Optinimizing query with fts + composite index

I have the following query:
SELECT *
FROM table
WHERE
structural_type=1
AND parent_id='167F2-F'
AND points_to_id=''
# AND match(search) against ('donotmatch124213123123')
The search takes about 10ms to run, running on the composite index (structural_type, parent_id, points_to_id). However, when I add in the fts index, the query balloons to taking ~1s, regardless of what is contained in the match criteria. Basically it seems like it 'skips the index' whenever I have a fts search applied.
What would be the best way to optimize this query?
Update: a few explains:
EXPLAIN SELECT... # without fts
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE table NULL ref structural_type structural_type 209 const,const,const 2 100.00 NULL
With fts (also adding 'force index'):
explain SELECT ... force INDEX (structural_type) AND match...
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE table NULL fulltext structural_type,search search 0 const 1 5.00 Using where; Ft_hints: sorted
The only thing I can think of which would be incredibly hack-ish, would be to add an additional term to the fts so it does the filter 'within' that. For example:
fts_term = fts_term += " StructuralType1ParentID167F2FPointsToID"
The MySQL optimizer can only use one index for your WHERE clause, so it has to choose between the composite one and the FULLTEXT one.
Since it can't run both queries to bench which one is faster, it will estimate how fast will different execution plans be.
To do so, MySQL uses some internal stats it keeps about each table. But those stats can be very different from the reality if they aren't updated and the data changes in the table.
Running a OPTIMIZE TABLE table query allows MySQL to refresh its table stats, so it will be able to perform better estimates and choose the better index.
Try expressing this without the full text logic, using like:
SELECT *
FROM table
WHERE structural_type = 1 AND
parent_id ='167F2-F' AND
points_to_id = '' AND
search not like '%donotmatch124213123123%';
The index should still be used for the first three columns. LIKE might be slow, but if not many rows match the first three, this might not be as bad as using the full text index.

DISTINCT causing full table scan

I have a table in MySQL (5.5.31) which has about 20M rows. The following query:
SELECT DISTINCT mytable.name name FROM mytable
LEFT JOIN mytable_c ON mytable_c.id_c = mytable.id
WHERE mytable.deleted = 0 ORDER BY mytable.date_modified DESC LIMIT 0,21
is causing full table scan, with explain saying type is ALL and extra info is Using where; Using temporary; Using filesort. Explain results:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE mytable ALL NULL NULL NULL NULL 19001156 Using where; Using temporary; Using filesort
1 SIMPLE mytable_c eq_ref PRIMARY PRIMARY 108 mytable.id 1 Using index
Without the join explain looks like:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE mytable index NULL mytablemod 9 NULL 21 Using where; Using temporary
id_c is the primary key for mytable_c and mytable_c does not have more than one row for every row in mytable. date_modified is indexed. But looks like MySQL does not understand that. If I remove the DISTINCT clause, then explain uses index and touches only 21 rows just as expected. If I remove the join it also does this. Is there any way to make it work without the full table scan with the join? explain shows mysql knows it needs only one row from mytable_c and it is using the primary key, but still does full scan on mytable.
The reason DISTINCT is there that the query is generated by the ORM system in which there might be cases where multiple rows may be produced by JOINs, but the values of SELECT fields will always be unique (i.e. if JOIN is against multi-value link only fields that are the same in every joined row will be in SELECT).
These are just generic comments, not mysql specific.
To find all the possible name values from mytable a full scan of either the table or an index needs to happen. Possible options:
full table scan
full index scan of an index starting with deleted (take advantage of the filter)
full index scan of an index starting with name (only column of concern for output)
If there was an index on deleted, the server could find all the deleted = 0 index entries and then look up the corresponding name value from the table. But if deleted has low cardinality or the statistics aren't there to say differently, it could be more expensive to do the double reads of first the index then the corresponding data item. In that case, just scan the table.
If there was an index on name, an index scan could be sufficient, but then the table needs to be checked for the filter. Again frequent hopping from index to table.
The join columns also need to be considered in a similar manner.
If you forget about the join part and had a multi-part index on columns name, deleted then an index scan would probably happen.
Update
To me the DISTINCT and ORDER BY parts are a bit confusing. Of which name record is the date_modified to be used for sorting? I think something like this would be a bit more clear:
SELECT mytable.name name --, MIN(mytable.date_modified)
FROM mytable
LEFT JOIN mytable_c ON mytable_c.id_c = mytable.id
WHERE mytable.deleted = 0
GROUP BY mytable.name
ORDER BY MIN(mytable.date_modified) DESC LIMIT 0,21
Either way, once the ORDER BY comes into play, a full scan needs to be done to find the order. Without the ORDER BY, the first 21 found could suffice.
Why do not you try to move condition mytable.deleted = 0 from WHERE to the JOIN ON ? You can also try FORCE INDEX (mytablemod)

No index on !=?

Consider the following two EXPLAINs:
EXPLAIN SELECT * FROM sales WHERE title != 'The'
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE sales ALL title NULL NULL NULL 41707 Using where
And -
EXPLAIN SELECT * FROM sales WHERE title = 'The'
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE sales ref title title 767 const 1 Using where
Why does the != query have a NULL key? Why doesn't it use title? What causes a = statement to be able to utilize an index but not a !=?
There is no point on using the index unless title is exactly 'The' very frequently.
Since almost every row needs to be selected you don't gain anything from using an index. It can actually be costly to use an index, which is probably what your MySQL engine is determining, so it is opting not to use the index.
Compare the amount of work done in these two situations:
Using the index:
1) Read the entire index tree into memory.
2) Search the index tree for the value 'The' and filter out those entries.
3) Read every row except for the few exceptions (which probably are in the same blocks on the disk as rows that do need to be read, so really the whole table is likely to be read in) from the table into memory.
Without the index:
1) Read every row into memory and while reading them filter out any where title = 'The' from the result set

Shouldn't this be using an index instead of where?

EXPLAIN EXTENDED SELECT `member`.`id` , `member`.`name`
FROM `member`
WHERE `member`.`last_active` > '1289348406'
Shows the following output despite last_active having an index on it.... shouldn't it say index instead of where ?
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE member range last_active last_active 4 NULL 2 100.00 Using where
Using index means the query does not touch the table at all:
Using index
The column information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row. This strategy can be used when the query uses only columns that are part of a single index.
Since not all fields are covered by your index, it's impossible.
The index itself is of course being used (since the access type is range), but it still needs to do the row lookup in the table to retrieve the values of name and id.
Create a covering index on (last_active, name, id) if you want to see Using index.

MySQL ignores my index

I'm quite new to setting indexes myself. I'm currently just experimenting with it to discover how it works and in what cases a database will make use of the index.
I've got a simple table with 3 columns; an id, a name and a status. I've set an index on the name which is a CHAR(30) column. Against my expectations, MySQL ignores this index in the following query:
SELECT * FROM people WHERE name = 'Peter'
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE people ref name name 90 const 1 Using where
However, when using the following query, the index is used:
SELECT COUNT(*) FROM people WHERE name = 'Peter'
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE people ref name name 90 const 1 Using where; Using index
Could anyone please explain this to me?
"Using index" means it's using the index as a "covering index". This happens when it only needs to access the index to satisfy the query.
If, on the other hand, "Using index" is absent, but in the "key" column, the index is named, then it's using that index in the way described in the "ref" column.
So in both cases it's using the index, but only the COUNT() uses it as a covering index.
For each query, the "key" columns indicates "name" -- so, I'd say your two queries both used the index called "name", which probably is on the "name" column -- and this is what you want (quoting the manual) :
The key column indicates the key
(index) that MySQL actually decided to
use. If MySQL decides to use one of
the possible_keys indexes to look up
rows, that index is listed as the key
value.
Also, you are only going through "1" row, which is good (no full-scan or anything like that).
The "type" says "ref", which seems to be a good thing :
All rows with matching index values
are read from this table for each
combination of rows from the previous
tables. ... If the key that is used
matches only a few rows, this is a
good join type.
ref can be used for indexed columns
that are compared using the = or <=>
operator.
And the "ref" column indicates "const" -- not sure what it means exactly, but as far as I know, it's a good thing.
What makes you think your index is not used for one column ?
Just as a reference, for more informations : 7.2.1. Optimizing Queries with EXPLAIN