Using IN with a subquery doesn't use index - mysql

I have the fowlloing query
select * from mapping_channel_fqdn_virtual_host where id in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
explaining the above query gives me the following result:
explain select * from mapping_channel_fqdn_virtual_host where id in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
+----+-------------+-----------------------------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | mapping_channel_fqdn_virtual_host | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 10 | 100.00 | Using where |
+----+-------------+-----------------------------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0,00 sec)
id is the primary key of the table. It looks like this is a range type of query and it uses the Primary Key as index for this one.
When I try to explain the following query I get different results:
explain select * from mapping_channel_fqdn_virtual_host where id in (select max(id) from mapping_channel_fqdn_virtual_host group by channel_id, fqdn_virtual_host_id);
+----+-------------+-----------------------------------+------------+-------+------------------+------------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------------------+------------+-------+------------------+------------------+---------+------+------+----------+-------------+
| 1 | PRIMARY | mapping_channel_fqdn_virtual_host | NULL | ALL | NULL | NULL | NULL | NULL | 4849 | 100.00 | Using where |
| 2 | SUBQUERY | mapping_channel_fqdn_virtual_host | NULL | index | idx_channel_fqdn | idx_channel_fqdn | 8 | NULL | 4849 | 100.00 | Using index |
+----+-------------+-----------------------------------+------------+-------+------------------+------------------+---------+------+------+----------+-------------+
2 rows in set, 1 warning (0,00 sec)
idx_channel_fqdn is a composite index key for the column pair used in the groupby clause. But when using subquery the Primary query stops using the index like it did before. Can you explain why this is happening?
Tried the JOIN query Anouar suggested:
explain select * from mapping_channel_fqdn_virtual_host as x join (select max(id) as ids from mapping_channel_fqdn_virtual_host group by channel_id, fqdn_virtual_host_id) as y on x.id=y.ids;
+----+-------------+-----------------------------------+------------+--------+------------------+------------------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------------------+------------+--------+------------------+------------------+---------+-------+------+----------+-------------+
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 4849 | 100.00 | Using where |
| 1 | PRIMARY | x | NULL | eq_ref | PRIMARY | PRIMARY | 4 | y.ids | 1 | 100.00 | NULL |
| 2 | DERIVED | mapping_channel_fqdn_virtual_host | NULL | index | idx_channel_fqdn | idx_channel_fqdn | 8 | NULL | 4849 | 100.00 | Using index |
+----+-------------+-----------------------------------+------------+--------+------------------+------------------+---------+-------+------+----------+-------------+
3 rows in set, 1 warning (0,00 sec)
Judging by the index and eq_ref types it looks like it is better to use the JOIN than the subquery? Can you explain a bit more the outcome of the join explain expressiong?

You can see the answers for this question
You will fine ideas.
I am quoting from some answers
Sometimes MySQL does not use an index, even if one is available. One
circumstance under which this occurs is when the optimizer estimates
that using the index would require MySQL to access a very large
percentage of the rows in the table. (In this case, a table scan is likely to be much faster because it requires fewer seeks.)
Briefly, Try forcing the index:
SELECT *
FROM mapping_channel_fqdn_virtual_host FORCE INDEX (name of the index you want to use)
WHERE (mapping_channel_fqdn_virtual_host.id IN (1,2,3,4,5,6,7,8,9,10));
Or use JOIN instead and see the explain
SELECT * FROM mapping_channel_fqdn_virtual_host mcf
JOIN (select max(id) as ids from mapping_channel_fqdn_virtual_host group by channel_id, fqdn_virtual_host_id)) AS mcfv
ON mcf.id = mcfv.ids;

Related

Query not using index depending on COUNT parameter

First, I create a simple database with one MyISAM table with an indexed field called feature.
CREATE DATABASE test;
USE test;
CREATE TABLE data(
id INT(11) NOT NULL AUTO_INCREMENT,
feature VARCHAR(64),
PRIMARY KEY (id)
) ENGINE=MyISAM;
INSERT INTO data VALUES (1, 'a'), (2, 'b');
CREATE INDEX data_feature ON data(feature);
Then, when I test a GROUP BY query with a count, it doesn't use the index when the COUNT() is made by id (see Extra column at the end of the EXPLAIN).
mysql> EXPLAIN SELECT feature, COUNT(1) FROM data GROUP BY feature;
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| 1 | SIMPLE | data | NULL | index | data_feature | data_feature | 259 | NULL | 2 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT feature, COUNT(*) FROM data GROUP BY feature;
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| 1 | SIMPLE | data | NULL | index | data_feature | data_feature | 259 | NULL | 2 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT feature, COUNT(id) FROM data GROUP BY feature;
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------+
| 1 | SIMPLE | data | NULL | index | data_feature | data_feature | 259 | NULL | 2 | 100.00 | NULL |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
I have tested it on MySQL Community 8.0.21 and MariaDB 10.3.25.
COUNT(id) obligates the Optimizer to check id for being NOT NULL. The standard pattern is to simply say COUNT(*).
InnoDB probably works differently. But this is because the PRIMARY KEY is implicitly tacked onto the end of any secondary index. That makes INDEX(feature) work like INDEX(feature, id), in which case it would be a "covering" index as indicated by "Using index".
(There is virtually no reason to stick with MyISAM in this decade.)

MySQL : Indexes for View based on an aggregated query

I have a working, nice, indexed SQL query aggregating notes (sum of ints) for all my users and others stuffs. This is "query A".
I want to use this aggregated notes in others queries, say "query B".
If I create a View based on "query A", will the indexes of the original query will be used when needed if I join it in "query B" ?
Is that true for MySQL ? For others flavors of SQL ?
Thanks.
In MySQL, you cannot create an index on a view. MySQL uses indexes of the underlying tables when you query data against the views that use the merge algorithm. For the views that use the temptable algorithm, indexes are not utilized when you query data against the views.
https://www.percona.com/blog/2007/08/12/mysql-view-as-performance-troublemaker/
Here's a demo table. It has a userid attribute column and a note column.
mysql> create table t (id serial primary key, userid int not null, note int, key(userid,note));
If you do an aggregation to get the sum of note per userid, it does an index-scan on (userid, note).
mysql> explain select userid, sum(note) from t group by userid;
+----+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
| 1 | SIMPLE | t | index | userid | userid | 9 | NULL | 1 | Using index |
+----+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
1 row in set (0.00 sec)
If we create a view for the same query, then we can see that querying the view uses the same index on the underlying table. Views in MySQL are pretty much like macros — they just query the underlying table.
mysql> create view v as select userid, sum(note) from t group by userid;
Query OK, 0 rows affected (0.03 sec)
mysql> explain select * from v;
+----+-------------+------------+-------+---------------+--------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+--------+---------+------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 2 | NULL |
| 2 | DERIVED | t | index | userid | userid | 9 | NULL | 1 | Using index |
+----+-------------+------------+-------+---------------+--------+---------+------+------+-------------+
2 rows in set (0.00 sec)
So far so good.
Now let's create a table to join with the view, and join to it.
mysql> create table u (userid int primary key, name text);
Query OK, 0 rows affected (0.09 sec)
mysql> explain select * from v join u using (userid);
+----+-------------+------------+-------+---------------+-------------+---------+---------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+-------------+---------+---------------+------+-------------+
| 1 | PRIMARY | u | ALL | PRIMARY | NULL | NULL | NULL | 1 | NULL |
| 1 | PRIMARY | <derived2> | ref | <auto_key0> | <auto_key0> | 4 | test.u.userid | 2 | NULL |
| 2 | DERIVED | t | index | userid | userid | 9 | NULL | 1 | Using index |
+----+-------------+------------+-------+---------------+-------------+---------+---------------+------+-------------+
3 rows in set (0.01 sec)
I tried to use hints like straight_join to force it to read v then join to u.
mysql> explain select * from v straight_join u on (v.userid=u.userid);
+----+-------------+------------+-------+---------------+--------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+--------+---------+------+------+----------------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 7 | NULL |
| 1 | PRIMARY | u | ALL | PRIMARY | NULL | NULL | NULL | 1 | Using where; Using join buffer (Block Nested Loop) |
| 2 | DERIVED | t | index | userid | userid | 9 | NULL | 7 | Using index |
+----+-------------+------------+-------+---------------+--------+---------+------+------+----------------------------------------------------+
"Using join buffer (Block Nested Loop)" is MySQL's terminology for "no index used for the join." It's just looping over the table the hard way -- by reading batches of rows from start to finish of the table.
I tried to use force index to tell MySQL that type=ALL is to be avoided.
mysql> explain select * from v straight_join u force index(PRIMARY) on (v.userid=u.userid);
+----+-------------+------------+--------+---------------+---------+---------+----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+---------+---------+----------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 7 | NULL |
| 1 | PRIMARY | u | eq_ref | PRIMARY | PRIMARY | 4 | v.userid | 1 | NULL |
| 2 | DERIVED | t | index | userid | userid | 9 | NULL | 7 | Using index |
+----+-------------+------------+--------+---------------+---------+---------+----------+------+-------------+
Maybe this is using an index for the join? But it's weird that table u is before table t in the EXPLAIN. I'm frankly not sure how to understand what it's doing, given the order of rows in this EXPLAIN report. I would expect the joined table should come after the primary table of the query.
I only put a few rows of data into each table. One might get some different EXPLAIN results with a larger representative sample of test data. I'll leave that to you to try.

Mysql 5.6 optimizer doesn't use indexes in small tables joins

We have two tables - the first is relatively big (contact table) 250k rows and the second is small(user table, < 10 rows). On mysql 5.6 version I have next explain result:
EXPLAIN SELECT
o0_.id AS id_0,
o8_.first_name,
o8_.last_name
FROM
contact o0_
LEFT JOIN user o8_ ON o0_.user_owner_id = o8_.id
LIMIT
25 OFFSET 100
+----+-------------+-------+-------+---------------+----------------------+---------+------+--------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+----------------------+---------+------+--------+----------------------------------------------------+
| 1 | SIMPLE | o0_ | index | NULL | IDX_403263ED9EB185F9 | 5 | NULL | 253030 | Using index |
| 1 | SIMPLE | o8_ | ALL | PRIMARY | NULL | NULL | NULL | 5 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+-------+---------------+----------------------+---------+------+--------+----------------------------------------------------+
2 rows in set (0,00 sec)
When i use force index for join:
EXPLAIN SELECT
o0_.id AS id_0,
o8_.first_name,
o8_.last_name
FROM
contact o0_
LEFT JOIN user o8_ force index for join(`PRIMARY`) ON o0_.user_owner_id = o8_.id
LIMIT
25 OFFSET 100
or adding indexes on fields which appears in select clause (first_name, last_name) on user table:
alter table user add index(first_name, last_name);
Explain result changes to this:
+----+-------------+-------+--------+---------------+----------------------+---------+-------------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+----------------------+---------+-------------------------+--------+-------------+
| 1 | SIMPLE | o0_ | index | NULL | IDX_403263ED9EB185F9 | 5 | NULL | 253030 | Using index |
| 1 | SIMPLE | o8_ | eq_ref | PRIMARY | PRIMARY | 4 | o0_.user_owner_id | 1 | NULL |
+----+-------------+-------+--------+---------------+----------------------+---------+-------------------------+--------+-------------+
2 rows in set (0,00 sec)
On mysql 5.5 version I have same explain result without additional indexes:
+----+-------------+-------+--------+---------------+----------------------+---------+-------------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+----------------------+---------+-------------------------+--------+-------------+
| 1 | SIMPLE | o0_ | index | NULL | IDX_403263ED9EB185F9 | 5 | NULL | 255706 | Using index |
| 1 | SIMPLE | o8_ | eq_ref | PRIMARY | PRIMARY | 4 | o0_.user_owner_id | 1 | |
+----+-------------+-------+--------+---------------+----------------------+---------+-------------------------+--------+-------------+
2 rows in set (0.00 sec)
Why i need force use PRIMARY index or add extra indexes on mysql 5.6 version?
Same behavior occurs with other selects, when join small tables.
If you have a table with so few rows, it may actually be faster to do a full table scan, than going to an index, locate the records and then go back to the table. If you have other fields in the user table apart from the 3 in the query, then you may consider adding a covering index, but franly, I do not think that any of this would have significant affect on the speed of the query.

MySQL Indexing - In vs. Equals indexing issues

Following queries run quite fast and instantaneously on mysql server:
SELECT table_name.id
FROM table_name
WHERE table_name.id in (10000)
SELECT table_name.id
from table_name
where table_name.id = (SELECT table_name.id
FROM table_name
WHERE table_name.id in (10000)
);
But if I change the second query to as following, then it takes more than 20 seconds:
SELECT table_name.id
from table_name
where table_name.id in (SELECT table_name.id
FROM table_name
WHERE table_name.id in (10000)
);
On doing explain, I get the following output. It is clear that there are some issues regarding how MySQL indexes the data, and use in keyword.
For first query:
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | table_name | const | PRIMARY | PRIMARY | 4 | const | 1 | Using index |
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------------+
For second query:
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------------+
| 1 | PRIMARY | table_name | const | PRIMARY | PRIMARY | 4 | const | 1 | Using index |
| 2 | SUBQUERY | table_name | const | PRIMARY | PRIMARY | 4 | | 1 | Using index |
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------------+
For third query:
+----+--------------------+------------+-------+---------------+---------+---------+-------+---------+--------------------------+
| id | select_type | table_name | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+-------+---------------+---------+---------+-------+---------+--------------------------+
| 1 | PRIMARY | table_name | index | NULL | sentTo | 5 | NULL | 6250751 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | table_name | const | PRIMARY | PRIMARY | 4 | const | 1 | Using index |
+----+--------------------+------------+-------+---------------+---------+---------+-------+---------+--------------------------+
I am using InnoDB and have tried changing the third query to forcibly use the index as indicated by the following category.
In first case you have only first record from subquery (It runs once, because equals is only for first value)
In second query you got Cartesian multiplication (each per each) because IN runs subquery for each row. Which is not good for performance
Try to use joins for these cases.

Why is MySQL not using an index when I'm including a subquery?

I have the following query, which is fine, but will get slower as the brands table grows:
mysql> explain select brand_id as id,brands.name from tags
-> INNER JOIN brands on tags.brand_id = brands.id
-> where brand_id in
-> (select brand_id from tags where outfit_id in
-> (1,6,68,265,271))
-> group by brand_id, brands.name
-> ORDER BY count(brand_id)
-> LIMIT 5;
+----+--------------------+--------+----------------+------------------------------------------------+------------------------+---------+-----------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------+----------------+------------------------------------------------+------------------------+---------+-----------------+------+----------------------------------------------+
| 1 | PRIMARY | brands | ALL | PRIMARY | NULL | NULL | NULL | 165 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | tags | ref | index_tags_on_brand_id | index_tags_on_brand_id | 5 | waywn.brands.id | 1 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | tags | index_subquery | index_tags_on_outfit_id,index_tags_on_brand_id | index_tags_on_brand_id | 5 | func | 1 | Using where |
+----+--------------------+--------+----------------+------------------------------------------------+------------------------+---------+-----------------+------+----------------------------------------------+
3 rows in set (0.00 sec)
I don't understand why it isn't using the primary key as the index here and doing a file sort. If I replace the subquery with the values returned from that subquery, MySQL correctly uses the indices:
mysql> explain select brand_id as id,brands.name from tags
-> INNER JOIN brands on tags.brand_id = brands.id
-> where brand_id in
-> (2, 2, 9, 10, 40, 32, 9, 118)
-> group by brand_id, brands.name
-> ORDER BY count(brand_id)
-> LIMIT 5;
+----+-------------+--------+-------+------------------------+------------------------+---------+-----------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+------------------------+------------------------+---------+-----------------+------+----------------------------------------------+
| 1 | SIMPLE | brands | range | PRIMARY | PRIMARY | 4 | NULL | 6 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | tags | ref | index_tags_on_brand_id | index_tags_on_brand_id | 5 | waywn.brands.id | 1 | Using where; Using index |
+----+-------------+--------+-------+------------------------+------------------------+---------+-----------------+------+----------------------------------------------+
2 rows in set (0.00 sec)
mysql> explain select brand_id from tags where outfit_id in (1,6,68,265,271);
+----+-------------+-------+-------+-------------------------+-------------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-------------------------+-------------------------+---------+------+------+-------------+
| 1 | SIMPLE | tags | range | index_tags_on_outfit_id | index_tags_on_outfit_id | 5 | NULL | 8 | Using where |
+----+-------------+-------+-------+-------------------------+-------------------------+---------+------+------+-------------+
1 row in set (0.00 sec)
Why would this be? It doesn't really make sense to me. I mean, I can break it up into 2 calls, but that seems poor. I did notice that I can make it slightly more efficient by including a distinct in the subquery, but that didn't change the way it uses keys at all.
Why don't you juste write :
SELECT brand_id as id,brands.name
FROM tags
INNER JOIN brands ON tags.brand_id = brands.id
WHERE outfit_id in (1,6,68,265,271)
GROUP BY brand_id, brands.name
ORDER BY count(brand_id)
LIMIT 5;