I have table
user[id, name, status] with index[status, name, id]
SELECT *
FROM user
WHERE status = 'active'
ORDER BY name, id
LIMIT 50
I have about 50000 users with status == 'active'
1.) Why does MySQL explain show about 50000 in ROWS column? Why it follows all leaf nodes even when the index columns equals to order by clause?
2.) When I change order by clause to
ORDER BY status, name, id
EXTRA column of explain clause shows:
Using index condition; Using where; Using filesort
Is there any reason why it can't use index order in this query?
edit1:
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`status` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `status_name_id` (`status`,`name`,`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
query:
SELECT *
FROM `user`
WHERE status = 'complete'
ORDER BY status, name, id
LIMIT 50
explain:
id: 1
select_type: SIMPLE
table: f_order
type: ref
possible_keys: status_name_id
key: status_name_id
key_len: 768
ref: const
rows: 50331
Extra: "Using where; Using index; Using filesort"
The weirdest thing is that if I change SELECT statement to
SELECT *, count(id)
It use index again and query is twice faster. And extra section contains only
Using where; Using index
Table contains 100k rows, 5 different statuses and 12 different names.
MySQL: 5.6.27
edit2:
Another example:
This takes 400ms (avg) and does explicit sort
SELECT *
FROM `user`
WHERE status IN('complete')
ORDER BY status, name, id
LIMIT 50
This takes 2ms (avg) and doesn't explicit sort
SELECT *
FROM `user`
WHERE status IN('complete', 'something else')
ORDER BY status, name, id
LIMIT 50
Q1: EXPLAIN is a bit lame. It fails to take into account the existence of the LIMIT when providing the Rows estimation. Be assured that if it can stop short, it will.
Q2: Did it say that it was using your index? Please provide the full EXPLAIN and SHOW CREATE TABLE.
More
With INDEX(status, name, id), the WHERE, ORDER BY, and LIMIT can be handled in the index. Hence it has to read only 50 rows.
Without that index, (or with practically any change to the query), much or all of the table would need to be read, stored in a tmp table, sorted, and only then could 50 rows be peeled off.
So, I suggest that it is more complicated than "explicit sort can kill my db server".
According to comments it is probably bug.
Related
I have a simple table:
CREATE TABLE `user_values` (
`id` bigint NOT NULL AUTO_INCREMENT,
`user_id` bigint NOT NULL,
`value` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`,`id`),
KEY `id` (`id`,`user_id`);
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
that I am trying to execute the following simple query:
select * from user_values where user_id in (20020, 20030) order by id desc;
I would fully expect this query to 100% use an index (either the (user_id, id) one or the (id, user_id) one) Yet, it turns out that's not the case:
explain select * from user_values where user_id in (20020, 20030); yields:
id
select_type
table
partitions
type
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
user_values
NULL
range
user_id
8
NULL
9
100.00
Using index condition; Using filesort
Why is that the case? How can I avoid a filesort on this trivial query?
You can't avoid the filesort in the query you show.
When you use a range predicate (for example, IN ( ) is a range predicate), and an index is used, the rows are read in index order. But there's no way for the MySQL query optimizer to guess that reading the rows in index order by user_id will guarantee they are also in id order. The two user_id values you are searching for are potentially scattered all over the table, in any order. Therefore MySQL must assume that once the matching rows are read, an extra step of sorting the result by id is necessary.
Here's an example of hypothetical data in which reading the rows by an index on user_id will not be in id order.
id
user_id
1
20030
2
20020
3
20016
4
20030
5
20020
So when reading from an index on (user_id, id), the matching rows will be returned in the following order, sorted by user_id first, then by id:
id
user_id
2
20020
5
20020
1
20030
4
20030
Clearly, the result is not in id order, so it needs to be sorted to satisfy the ORDER BY you requested.
The same kind of effect happens for other type of predicates, for example BETWEEN, or < or != or IS NOT NULL, etc. Every predicate except for = is a range predicate.
The only ways to avoid the filesort are to change the query in one of the following ways:
Omit the ORDER BY clause and accepting the results in whatever order the optimizer chooses to return them, which could be in id order, but only by coincidence.
Change the user_id IN (20020, 20030) to user_id = 20020, so there is only one matching user_id, and therefore reading the matching rows from the index will already be returned in the id order, and therefore the ORDER BY is a no-op. The optimizer recognizes when this is possible, and skips the filesort.
MySQL will most likely use index for the query (unless the user_id's in the query covers most of the rows).
The "filesort" happens in memory (it's really not a filesort), and is used to sort the found rows based on the ORDER BY clause.
You cannot avoid a "sort" in this case.
There were about 9 rows to sort, so it could not have taken long.
How long did the query take? Probably only a few milliseconds, so who cares?
"Filesort" does not necessarily mean that a "file" was involved. In many queries the sort is done in RAM.
Do you use id for anything other than to have a PRIMARY KEY on the table? If not, then this will help a small amount. (The speed-up won't be indicated in EXPLAIN.)
PRIMARY KEY (`user_id`,`id`), -- to avoid secondary lookups
KEY `id` (`id`); -- to keep auto_increment happy
I have indexes on products table:
PRIMARY
products_gender_id_foreign
products_subcategory_id_foreign
idx_products_cat_subcat_gender_used (multiple column index)
QUERY:
select `id`, `name`, `price`, `images`, `used`
from `products`
where `category_id` = '1' and
`subcategory_id` = '2' and
`gender_id` = '1' and
`used` = '0'
order by `created_at` desc
limit 24 offset 0
Question:
Why mysql uses index
products_subcategory_id_foreign
insted of
idx_products_cat_subcat_gender_used (multiple column index)
HERE IS EXPLAIN :
1 SIMPLE products NULL ref products_gender_id_foreign,products_subcategory_id... products_subcategory_id_foreign 5 const 2 2.50 Using index condition; Using where; Using filesort
As explained in the MySQL documentation, a index can be ignored in some circunstances. The ones that could apply in your case, as one index is already beeing used, are:
You are comparing indexed columns with constant values and MySQL has
calculated (based on the index tree) that the constants cover too
large a part of the table and that a table scan would be faster. See
Section 8.2.1.1, “WHERE Clause Optimization”.
You are using a key with low cardinality (many rows match the key
value) through another column. In this case, MySQL assumes that by
using the key it probably will do many key lookups and that a table
scan would be faster.
My guess is that the values of category_id are not sparse enough
As I say here, this
where `category_id` = '1' and
`subcategory_id` = '2' and
`gender_id` = '1' and
`used` = '0'
order by `created_at` desc
limit 24 offset 0
needs a 5-column composite index:
INDEX(category_id, subcategory_id, gender_id, used, -- in any order
created_at)
to get to the LIMIT, thereby not having to fetch lots of rows and sort them.
As for your actual question about which index it picked... Probably the cardinality of one inadequate index was better than the other.
MySQL 5.5
I am trying to find the correct index for a query.
Table:
create table trans (
trans_id int(11) not null auto_increment,
acct_id int(11) not null,
status_id int(11) not null,
trans_transaction_type varchar(5) not null,
trans_amount float(9,3) default null,
trans_datetime datetime not null default '0000-00-00 00:00:00',
primary key (trans_id)
)
Query:
select trans_id
from trans
where acct_id = _acctid
and transaction_type in ('_1','_2','_3','_4')
and trans_datetime between _start and _end
and status_id = 6
Cardinality:
select *
from information_schema.statistics
where table_name='trans'
Result:
trans_id 424339375
acct_id 12123818
trans_transaction_type 70722272
trans_datetime 84866726
status_id 22
I am trying to find what is the correct index for the query?
alter table trans add index (acct_id, trans_transaction_type, trans_datetime, status_id);
alter table trans add index (acct_id, trans_datetime, trans_transaction_type, status_id);
etc...
Which columns go first in the index?
The goal is query speed/performance. Disk space usage is of no concern.
The base of indexing a table is to make the queries light to improve performance, the first index to be added should always be the primary key of the table (trans_id in this case), and after that, the other id columns should be indexed too.
alter table trans add index (trans_id, acct_id, status_id);
The other fields are not needed as indexes, unless you query too often based on them.
Plan A
Start with any WHERE clause that is col = constant. Then move on to one more thing.
Suggest you add both of the following, because it is not easy to predict which will be better:
INDEX(acct_id, status_id, transaction_type)
INDEX(acct_id, status_id, trans_datetime)
Plan B
Do you really have only trans_id in the SELECT list? If so, then it should not be bad to turn this into a "covering" index. That's an index where the entire operation can be performed in the BTree where the index lives thereby avoid having to reach over into the data.
To build such, first build the optimal non-covering index, then add the rest of the fields mentioned anywhere in the query. Either of these should work:
INDEX(acct_id, status_id, trans_datetime, transaction_type, trans_id)
INDEX(acct_id, status_id, transaction_type, trans_datetime, trans_id)
The first two fields can be in either order (both are '='). The last two fields can be in either order (both are useless for finding the rows; they exist only for 'covering').
I recommend against having more than, say, 5 columns in an index.
More info in my Index Cookbook.
Notes
Perform EXPLAIN SELECT. You should see 'Using index' when it is a 'covering' index.
I think the EXPLAIN's Key_len will (in all cases here) show the combined lengths of only acct_id and status_id.
You are in a Stored Procedure? If the version in the SP runs significantly slower than when you experiment, you may need to re-code to CONCAT, PREPARE, and EXECUTE the query.
I have a table sample with two columns id and cnt and another table PostTags with two columns postid and tagid
I want to update all cnt values with their corresponding counts and I have written the following query:
UPDATE sample SET
cnt = (SELECT COUNT(tagid)
FROM PostTags
WHERE sample.postid = PostTags.postid
GROUP BY PostTags.postid)
I intend to update entire column at once and I seem to accomplish this. But performance-wise, is this the best way? Or is there a better way?
EDIT
I've been running this query (without GROUP BY) for over 1 hour for ~18m records. I'm looking for a query that is better in performance.
That query should not take an hour. I just did a test, running a query like yours on a table of 87520 keywords and matching rows in a many-to-many table of 2776445 movie_keyword rows. In my test, it took 32 seconds.
The crucial part that you're probably missing is that you must have an index on the lookup column, which is PostTags.postid in your example.
Here's the EXPLAIN from my test (finally we can do EXPLAIN on UPDATE statements in MySQL 5.6):
mysql> explain update kc1 set count =
(select count(*) from movie_keyword
where kc1.keyword_id = movie_keyword.keyword_id) \G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: kc1
type: index
possible_keys: NULL
key: PRIMARY
key_len: 4
ref: NULL
rows: 98867
Extra: Using temporary
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: movie_keyword
type: ref
possible_keys: k_m
key: k_m
key_len: 4
ref: imdb.kc1.keyword_id
rows: 17
Extra: Using index
Having an index on keyword_id is important. In my case, I had a compound index, but a single-column index would help too.
CREATE TABLE `movie_keyword` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`movie_id` int(11) NOT NULL,
`keyword_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `k_m` (`keyword_id`,`movie_id`)
);
The difference between COUNT(*) and COUNT(movie_id) should be immaterial, assuming movie_id is NOT NULLable. But I use COUNT(*) because it'll still count as an index-only query if my index is defined only on the keyword_id column.
Remove the unnecessary GROUP BY and the statement looks good. If however you expect many sample.set already to contain the correct value, then you would update many records that need no update. This may create some overhead (larger rollback segments, triggers executed etc.) and thus take longer.
In order to only update the records that need be updated, join:
UPDATE sample
INNER JOIN
(
SELECT postid, COUNT(tagid) as cnt
FROM PostTags
GROUP BY postid
) tags ON tags.postid = sample.postid
SET sample.cnt = tags.cnt
WHERE sample.cnt != tags.cnt OR sample.cnt IS NULL;
Here is the SQL fiddle: http://sqlfiddle.com/#!2/d5e88.
I am using MySQL version 5.5.14 to run the following query, QUERY 1, from a table of 5 Million rows:
SELECT P.ID, P.Type, P.Name, P.cty
, X(P.latlng) as 'lat', Y(P.latlng) as 'lng'
, P.cur, P.ak, P.tn, P.St, P.Tm, P.flA, P.ldA, P.flN
, P.lv, P.bd, P.bt, P.nb
, P.ak * E.usD as 'usP'
FROM PIG P
INNER JOIN EEL E
ON E.cur = P.cur
WHERE act='1'
AND flA >= '1615'
AND ldA >= '0'
AND yr >= (YEAR(NOW()) - 100)
AND lv >= '0'
AND bd >= '3'
AND bt >= '2'
AND nb <= '5'
AND cDate >= NOW()
AND MBRContains(LineString( Point(39.9097, -2.1973)
, Point(65.5130, 41.7480)
), latlng)
AND Type = 'g'
AND tn = 'l'
AND St + Tm - YEAR(NOW()) >= '30'
HAVING usP BETWEEN 300/2 AND 300
ORDER BY ak
LIMIT 100;
Using an Index (Type, tn, act, flA), I am able to obtain results within 800ms. In QUERY 2, I changed the ORDER BY clause to lv, I am also able to obtain results within similar timings. In QUERY 3, I changed the ORDER BY clause to ID and the query time slowed dramatically to a full 20s on an average of 10 trials.
Running the EXPLAIN SELECT statement produces exactly the same query execution plan:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: P
type: range
possible_keys: Index
key: Index
key_len: 6
ref: NULL
rows: 132478
Extra: Using where; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: E
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 3
ref: BS.P.cur
rows: 1
Extra:
My question is: why does ordering by ID in QUERY 3 runs so slow compared to the rest?
The partial table definition is as such:
CREATE TABLE `PIG` (
`ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
`lv` smallint(3) unsigned NOT NULL DEFAULT '0',
`ak` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`ID`),
KEY `id_ca` (`cty`,`ak`),
KEY `Index` (`Type`, `tn`, `act`, `flA`),
) ENGINE=MyISAM AUTO_INCREMENT=5000001 DEFAULT CHARSET=latin1
CREATE TABLE `EEL` (
`cur` char(3) NOT NULL,
`usD` decimal(11,10) NOT NULL,
PRIMARY KEY (`cur`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
UPDATE: After extensive testing of various ORDER BYs options, I have confirmed that the ID column which happens to be the Primary Key is the only one causing the slow query time.
From MySQL documentation at http://dev.mysql.com/doc/refman/5.6/en/order-by-optimization.html
In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause. These cases include the following:
. . .
The key used to fetch the rows is not the same as the one used in the ORDER BY:
`SELECT * FROM t1 WHERE key2=constant ORDER BY key1;`
This probably won't help, but what happens if you add AND ID > 0 to the WHERE clause? Would this cause MySQL to use the primary key for sorting? Worth a try I suppose.
(It seems odd that ordering with ak is efficient, since ak does not even have an index, but that may be due to fewer values for ak?)
If the condition in the WHERE clause differs from the one in the ORDER BY or it is not part of a composite index, then the sorting does not take place in the storage engine but rather at the MySQL server level which is much slower. Long story short you must rearrange your indexes in order to satisfy both the row filtering and the sorting as well.
you can use force index(PRIMARY)
try it, and you will see in explain query that mysql now will use the primary key index when 'order by'