Mysql multiple column index using wrong index - mysql

I have indexes on products table:
PRIMARY
products_gender_id_foreign
products_subcategory_id_foreign
idx_products_cat_subcat_gender_used (multiple column index)
QUERY:
select `id`, `name`, `price`, `images`, `used`
from `products`
where `category_id` = '1' and
`subcategory_id` = '2' and
`gender_id` = '1' and
`used` = '0'
order by `created_at` desc
limit 24 offset 0
Question:
Why mysql uses index
products_subcategory_id_foreign
insted of
idx_products_cat_subcat_gender_used (multiple column index)
HERE IS EXPLAIN :
1 SIMPLE products NULL ref products_gender_id_foreign,products_subcategory_id... products_subcategory_id_foreign 5 const 2 2.50 Using index condition; Using where; Using filesort

As explained in the MySQL documentation, a index can be ignored in some circunstances. The ones that could apply in your case, as one index is already beeing used, are:
You are comparing indexed columns with constant values and MySQL has
calculated (based on the index tree) that the constants cover too
large a part of the table and that a table scan would be faster. See
Section 8.2.1.1, “WHERE Clause Optimization”.
You are using a key with low cardinality (many rows match the key
value) through another column. In this case, MySQL assumes that by
using the key it probably will do many key lookups and that a table
scan would be faster.
My guess is that the values of category_id are not sparse enough

As I say here, this
where `category_id` = '1' and
`subcategory_id` = '2' and
`gender_id` = '1' and
`used` = '0'
order by `created_at` desc
limit 24 offset 0
needs a 5-column composite index:
INDEX(category_id, subcategory_id, gender_id, used, -- in any order
created_at)
to get to the LIMIT, thereby not having to fetch lots of rows and sort them.
As for your actual question about which index it picked... Probably the cardinality of one inadequate index was better than the other.

Related

Avoid filesort in simple filtered ordered query

I have a simple table:
CREATE TABLE `user_values` (
`id` bigint NOT NULL AUTO_INCREMENT,
`user_id` bigint NOT NULL,
`value` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`,`id`),
KEY `id` (`id`,`user_id`);
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
that I am trying to execute the following simple query:
select * from user_values where user_id in (20020, 20030) order by id desc;
I would fully expect this query to 100% use an index (either the (user_id, id) one or the (id, user_id) one) Yet, it turns out that's not the case:
explain select * from user_values where user_id in (20020, 20030); yields:
id
select_type
table
partitions
type
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
user_values
NULL
range
user_id
8
NULL
9
100.00
Using index condition; Using filesort
Why is that the case? How can I avoid a filesort on this trivial query?
You can't avoid the filesort in the query you show.
When you use a range predicate (for example, IN ( ) is a range predicate), and an index is used, the rows are read in index order. But there's no way for the MySQL query optimizer to guess that reading the rows in index order by user_id will guarantee they are also in id order. The two user_id values you are searching for are potentially scattered all over the table, in any order. Therefore MySQL must assume that once the matching rows are read, an extra step of sorting the result by id is necessary.
Here's an example of hypothetical data in which reading the rows by an index on user_id will not be in id order.
id
user_id
1
20030
2
20020
3
20016
4
20030
5
20020
So when reading from an index on (user_id, id), the matching rows will be returned in the following order, sorted by user_id first, then by id:
id
user_id
2
20020
5
20020
1
20030
4
20030
Clearly, the result is not in id order, so it needs to be sorted to satisfy the ORDER BY you requested.
The same kind of effect happens for other type of predicates, for example BETWEEN, or < or != or IS NOT NULL, etc. Every predicate except for = is a range predicate.
The only ways to avoid the filesort are to change the query in one of the following ways:
Omit the ORDER BY clause and accepting the results in whatever order the optimizer chooses to return them, which could be in id order, but only by coincidence.
Change the user_id IN (20020, 20030) to user_id = 20020, so there is only one matching user_id, and therefore reading the matching rows from the index will already be returned in the id order, and therefore the ORDER BY is a no-op. The optimizer recognizes when this is possible, and skips the filesort.
MySQL will most likely use index for the query (unless the user_id's in the query covers most of the rows).
The "filesort" happens in memory (it's really not a filesort), and is used to sort the found rows based on the ORDER BY clause.
You cannot avoid a "sort" in this case.
There were about 9 rows to sort, so it could not have taken long.
How long did the query take? Probably only a few milliseconds, so who cares?
"Filesort" does not necessarily mean that a "file" was involved. In many queries the sort is done in RAM.
Do you use id for anything other than to have a PRIMARY KEY on the table? If not, then this will help a small amount. (The speed-up won't be indicated in EXPLAIN.)
PRIMARY KEY (`user_id`,`id`), -- to avoid secondary lookups
KEY `id` (`id`); -- to keep auto_increment happy

MySQL pipelined order by seems not working

I have table
user[id, name, status] with index[status, name, id]
SELECT *
FROM user
WHERE status = 'active'
ORDER BY name, id
LIMIT 50
I have about 50000 users with status == 'active'
1.) Why does MySQL explain show about 50000 in ROWS column? Why it follows all leaf nodes even when the index columns equals to order by clause?
2.) When I change order by clause to
ORDER BY status, name, id
EXTRA column of explain clause shows:
Using index condition; Using where; Using filesort
Is there any reason why it can't use index order in this query?
edit1:
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`status` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `status_name_id` (`status`,`name`,`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
query:
SELECT *
FROM `user`
WHERE status = 'complete'
ORDER BY status, name, id
LIMIT 50
explain:
id: 1
select_type: SIMPLE
table: f_order
type: ref
possible_keys: status_name_id
key: status_name_id
key_len: 768
ref: const
rows: 50331
Extra: "Using where; Using index; Using filesort"
The weirdest thing is that if I change SELECT statement to
SELECT *, count(id)
It use index again and query is twice faster. And extra section contains only
Using where; Using index
Table contains 100k rows, 5 different statuses and 12 different names.
MySQL: 5.6.27
edit2:
Another example:
This takes 400ms (avg) and does explicit sort
SELECT *
FROM `user`
WHERE status IN('complete')
ORDER BY status, name, id
LIMIT 50
This takes 2ms (avg) and doesn't explicit sort
SELECT *
FROM `user`
WHERE status IN('complete', 'something else')
ORDER BY status, name, id
LIMIT 50
Q1: EXPLAIN is a bit lame. It fails to take into account the existence of the LIMIT when providing the Rows estimation. Be assured that if it can stop short, it will.
Q2: Did it say that it was using your index? Please provide the full EXPLAIN and SHOW CREATE TABLE.
More
With INDEX(status, name, id), the WHERE, ORDER BY, and LIMIT can be handled in the index. Hence it has to read only 50 rows.
Without that index, (or with practically any change to the query), much or all of the table would need to be read, stored in a tmp table, sorted, and only then could 50 rows be peeled off.
So, I suggest that it is more complicated than "explicit sort can kill my db server".
According to comments it is probably bug.

creating mysql index

I have query:
EXPLAIN SELECT * FROM _mod_news USE INDEX ( ind1 ) WHERE show_lv =1 AND active =1 AND START <= NOW( )
AND ( END >= NOW( ) OR END = "0000-00-00 00:00:00" ) AND id <> "18041" AND category_id = "3" AND leta =1 ORDER BY sort_id ASC , DATE DESC LIMIT 7
result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE _mod_news ref ind1 ind1 2 const,const 11386 Using where; Using filesort
mysql is performing full table scan
ind1 =
ALTER TABLE `_mod_news` ADD INDEX ind1 ( `show_lv`, `active`, `start`, `end`, `id`, `category_id`, `leta`, `sort_id`, `date`);
I tested on following index, but nothing changes
ALTER TABLE `_mod_news` ADD INDEX ind1 ( `show_lv`, `active`, `start`, `end`, `id`, `category_id`, `leta`);
Question is: where i can learn how to create indexes on many where conditions? Or someone can explain how to tell to mysql to use and index and not to scan whole table.
Thanks.
I would suggest not forcing index. Mysql is a great at selecting the best possible index unless you have better understanding of the data you are querying.
You cannot use ORDER BY optimization because you are mixing the ASC and DESC in that part.
Therefore your only option is to create index such that:
constant values before range
integers before dates, dates before strings, smaller size vales before bigger size values
Creating a large index also adds an overhead to storage and insert-update time, so i would not add to index fields that are not eliminating a lot of rows (i.e 90% or rows have a value of 1 or i.e id<>"18041" but that most likely eliminates < 1% of rows).
If you want to learn more about optimizing: http://dev.mysql.com/doc/refman/5.0/en/select-optimization.html
Create multiple different indexes (on decent size of data you expect seeing in the table), see which one mysql chooses, benchmark them by forcing each one of them, then use your common sense to cut down on index space usage.
You can see from you EXPLAIN output that it is actually NOT performing a full table scan because in that case it would not display it using the index even when you are forcing it.
You can try with USE INDEX or FORCE INDEX

MySQL: Why does an Order By ID runs much slower than Order By other Columns?

I am using MySQL version 5.5.14 to run the following query, QUERY 1, from a table of 5 Million rows:
SELECT P.ID, P.Type, P.Name, P.cty
, X(P.latlng) as 'lat', Y(P.latlng) as 'lng'
, P.cur, P.ak, P.tn, P.St, P.Tm, P.flA, P.ldA, P.flN
, P.lv, P.bd, P.bt, P.nb
, P.ak * E.usD as 'usP'
FROM PIG P
INNER JOIN EEL E
ON E.cur = P.cur
WHERE act='1'
AND flA >= '1615'
AND ldA >= '0'
AND yr >= (YEAR(NOW()) - 100)
AND lv >= '0'
AND bd >= '3'
AND bt >= '2'
AND nb <= '5'
AND cDate >= NOW()
AND MBRContains(LineString( Point(39.9097, -2.1973)
, Point(65.5130, 41.7480)
), latlng)
AND Type = 'g'
AND tn = 'l'
AND St + Tm - YEAR(NOW()) >= '30'
HAVING usP BETWEEN 300/2 AND 300
ORDER BY ak
LIMIT 100;
Using an Index (Type, tn, act, flA), I am able to obtain results within 800ms. In QUERY 2, I changed the ORDER BY clause to lv, I am also able to obtain results within similar timings. In QUERY 3, I changed the ORDER BY clause to ID and the query time slowed dramatically to a full 20s on an average of 10 trials.
Running the EXPLAIN SELECT statement produces exactly the same query execution plan:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: P
type: range
possible_keys: Index
key: Index
key_len: 6
ref: NULL
rows: 132478
Extra: Using where; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: E
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 3
ref: BS.P.cur
rows: 1
Extra:
My question is: why does ordering by ID in QUERY 3 runs so slow compared to the rest?
The partial table definition is as such:
CREATE TABLE `PIG` (
`ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
`lv` smallint(3) unsigned NOT NULL DEFAULT '0',
`ak` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`ID`),
KEY `id_ca` (`cty`,`ak`),
KEY `Index` (`Type`, `tn`, `act`, `flA`),
) ENGINE=MyISAM AUTO_INCREMENT=5000001 DEFAULT CHARSET=latin1
CREATE TABLE `EEL` (
`cur` char(3) NOT NULL,
`usD` decimal(11,10) NOT NULL,
PRIMARY KEY (`cur`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
UPDATE: After extensive testing of various ORDER BYs options, I have confirmed that the ID column which happens to be the Primary Key is the only one causing the slow query time.
From MySQL documentation at http://dev.mysql.com/doc/refman/5.6/en/order-by-optimization.html
In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause. These cases include the following:
. . .
The key used to fetch the rows is not the same as the one used in the ORDER BY:
`SELECT * FROM t1 WHERE key2=constant ORDER BY key1;`
This probably won't help, but what happens if you add AND ID > 0 to the WHERE clause? Would this cause MySQL to use the primary key for sorting? Worth a try I suppose.
(It seems odd that ordering with ak is efficient, since ak does not even have an index, but that may be due to fewer values for ak?)
If the condition in the WHERE clause differs from the one in the ORDER BY or it is not part of a composite index, then the sorting does not take place in the storage engine but rather at the MySQL server level which is much slower. Long story short you must rearrange your indexes in order to satisfy both the row filtering and the sorting as well.
you can use force index(PRIMARY)
try it, and you will see in explain query that mysql now will use the primary key index when 'order by'

Using filesort to sort by datetime column in MySQL

I have a table Cars with datetime (DATE) and bit (PUBLIC).
Now i would like to take rows ordered by DATE and with PUBLIC = 1 so i use:
select
c.*
from
Cars c
WHERE
c.PUBLIC = 1
ORDER BY
DATE DESC
But unfortunately when I use explain to see what is going on I have this:
1 SIMPLE a ALL IDX_PUBLIC,DATE NULL NULL NULL 103 Using where; Using filesort
And it takes 0,3 ms to take this data while I have only 100 rows. Is there any other way to disable filesort?
If i goes to indexes I have index on (PUBLIC, DATE) not unique.
Table def:
CREATE TABLE IF NOT EXISTS `Cars` (
`ID` int(11) NOT NULL auto_increment,
`DATE` datetime NOT NULL,
`PUBLIC` binary(1) NOT NULL default '0'
PRIMARY KEY (`ID`),
KEY `IDX_PUBLIC` (`PUBLIC`),
KEY `DATE` (`PUBLIC`,`DATE`)
) ENGINE=MyISAM AUTO_INCREMENT=186 ;
You need to have a composite index on (public, date)
This way, MySQL will filter on public and sort on date.
From your EXPLAIN I see that you don't have a composite index on (public, date).
Instead you have two different indexes on public and on date. At least, that's what their names IDX_PUBLIC and DATE tell.
Update:
You public column is not a BIT, it's a BINARY(1). It's a character type and uses character comparison.
When comparing integers to characters, MySQL converts the latter to the former, not vice versa.
These queries return different results:
CREATE TABLE t_binary (val BINARY(2) NOT NULL);
INSERT
INTO t_binary
VALUES
(1),
(2),
(3),
(10);
SELECT *
FROM t_binary
WHERE val <= 10;
---
1
2
3
10
SELECT *
FROM t_binary
WHERE val <= '10';
---
1
10
Either change your public column to be a bit or rewrite your query as this:
SELECT c.*
FROM Cars c
WHERE c.PUBLIC = '1'
ORDER BY
DATE DESC
, i. e. compare characters with characters, not integers.
If you are ordering by date, a sort will be required. If there isn't an index by date, then a filesort will be used. The only way to get rid of that would be to either add an index on date or not do the order by.
Also, a filesort does not always imply that the file will be sorted on disk. It could be sorting it in memory if the table is small enough or the sort buffer is large enough. It just means that the table itself has to be sorted.
Looks like you have an index on date already, and since you are using PUBLIC in your where clause, MySQL should be able to use that index. However, the optimizer may have decided that since you have so few rows it isn't worth bothering with the index. Try adding 10,000 or so rows to the table, re-analyze it, and see if that changes the plan.