How to avoid filesort for this mysql query - mysql

I need some help in avoiding filesort for this query.
SELECT id
FROM articles USE INDEX(group)
WHERE type = '4'
AND category = '161'
AND did < '10016'
AND id < '9869788'
ORDER BY id DESC
LIMIT 10
INDEX(group) is a covering index of (type, category, did, id)
Because of ORDER BY id DESC, filesort is performed. Is there a way to avoid filesort for such query?

Change the index column order. The index is useless for the sort because it'^s the 4th column and isn't ready to be used as-is.
Of course, this affects the usefulness of the index for this WHERE because you need an inequality column before an equality one
In the MySQL docs, you break "You use ORDER BY on nonconsecutive parts of a key" and "The key used to fetch the rows is not the same as the one used in the ORDER BY"
Edit: as per my link above, you can't have an index that satisfies both WHERE and ORDER BY. They are mutually exclusive because of the 2 conditions I posted above
Another suggestion:
a single column index on id
go back to the original index too
remove the index hint
hope that the optimiser works out that both indexes can be used ("index intersection")

Create index on id and type it should work for you.
If Id is unique you can also create the Primary Key for it.
your query is using file sort because it might be possible that the optimizer is using the first column of index you created in the query.

Related

MySQL query index

I am using MySQL 5.6 and try to optimize next query:
SELECT t1.field1,
...
t1.field30,
t2.field1
FROM Table1 t1
JOIN Table2 t2 ON t1.fk_int = t2.pk_int
WHERE t1.int_field = ?
AND t1.enum_filed != 'value'
ORDER BY t1.created_datetime desc;
A response can contain millions of records and every row consists of 31 columns.
Now EXPLAIN says in Extra that planner uses 'Using where'.
I tried to add next index:
create index test_idx ON Table1 (int_field, enum_filed, created_datetime, fk_int);
After that EXPLAIN says in Extra that planner uses "Using index condition; Using filesort"
"rows" value from EXPLAIN with index is less than without it. But in practice time of execution is longer.
So, the questions are next:
What is the best index for this query?
Why EXPLAIN says that 'key_len' of query with index is 5. Shouldn't it be 4+1+8+4=17?
Should the fields from ORDER BY be in index?
Should the fields from JOIN be in index?
try refactor your index this way
avoid (o move to the right after fk_int) the created_datetime column.. and move fk_int before the enum_filed column .. the in this wahy the 3 more colums used for filter shold be use better )
create index test_idx ON Table1 (int_field, fk_int, enum_filed);
be sure you have also an specific index on table2 column pk_int. if you have not add
create index test_idx ON Table2 (int_field, fk_int, enum_filed);
What is the best index for this query?
Maybe (int_field, created_datetime) (See next Q&A for reason.)
Why EXPLAIN says that 'key_len' of query with index is 5. Shouldn't it be 4+1+8+4=17?
enum_filed != defeats the optimizer. If there is only one other value for that enum (and it is NOT NULL), then use = and the other value. And try INDEX(int_field, enum_field, created_datetime) The Optimizer is much happier with = than with any inequality.
"5" could be indicating 2 columns, or it could be indicating one INT that is Nullable. If int_field can be NULL, consider changing it to NOT NULL; then the "5" would drop to "4".
Should the fields from ORDER BY be in index?
Only if the index can completely handle the WHERE. This usually occurs only if all the WHERE tests are =. (Hence, my previous answer.)
Another case for including those columns is "covering"; see next Q&A.
Should the fields from JOIN be in index?
It depends. One thing that gives some performance benefit is to include all columns mentioned anywhere in the SELECT. This is called a "covering" index and is indicated in EXPLAIN by Using index (not Using index condition). There are too many columns in t1 to add a "covering" index. I think the practical limit is about 5 columns.
My guess for your question № 1:
create index my_idx on Table1(int_field, created_datetime desc, fk_int)
or one of these (but neither will probably be worthwhile):
create index my_idx on Table1(int_field, created_datetime desc, enum_filed, fk_int)
create index my_idx on Table1(int_field, created_datetime desc, fk_int, enum_filed)
I'm supposing 3 things:
Table2.pk_int is already a primary key, judging by the name
The where condition on Table1.int_field is only satisfied by a small subset of Table1
The inequality on Table1.enum_filed (I would fix the typo, if I were you) only excludes a small subset of Table1
Question № 2: the key_len refers to the keys used. Don't forget that there is one extra byte for nullable keys. In your case, if int_field is nullable, it means that this is the only key used, otherwise both int_field and enum_filed are used.
As for questions 3 and 4: If, as I suppose, it's more efficient to start the query plan from the where condition on Table1.int_field, the composite index, in this case also with the correct sort order (desc), enables a scan of the index to get the output rows in the correct order, without an extra sort step. Furthermore, adding also fk_int to the index makes the retrieval of any record of Table1 unnecessary unless a corresponding record is present in Table2. For a similar reason you could also add enum_filed to the index, but, if this doesn't considerably reduce the output record count, the increase in index size will make things worse instead of better. In the end, you will have to try it out (with realistic data!).
Note that if you put another column between int_field and created_datetime in the index, the index won't provide the created_datetime (for a given int_field) in the desired output order.
The issue was fixed by adding more filters (to where clause) to the query.
Regarding to indexes 2 proposed indexes were helpful:
From #WalterTross with next index for initial query:
(int_field, created_datetime desc, enum_filed, fk_int)
With my short comment: desc indexes is not supported at MySQL 5.6 - this key word just reserved.
From #RickJames with next index for modified query:
(int_field, created_datetime)
Thanks everyone who tried to help. I really appreciate it.

Mysql query with order by use index but still gives “using where”

I'm encountering a problem with the following query:
select *
from testtable
where user_id = 1 and color = 1 and size = 1
order by created_at desc, id desc;
I was using two index
index1 (user_id, color, size)
index2 (created_at, id)
But i got "using where, using filesort" in the explain result
Then i changed the index to use all the five columns
index1 (user_id ... id)
And the "using filesort" gone, but i still get "using where"
What further steps should i take in order to completely use the indexes for this query?
Thank you
The first index is used for the where. If you also want to use the index for sorting, then you need an index on (user_id, color, size, created_at, id).
Unless you specifically intend to fetch or examine all rows from the table, you may have something wrong in your query if the Extra value is not Using where
From the MySQL docs
So in other word this is absolutely normal and OK.

MySQL Index for ORDER BY with Date Range in WHERE

Suppose you have a table with the following columns:
id
date
col1
I would like to be able to query this table with a specific id and date, and also order by another column. For example,
SELECT * FROM TABLE WHERE id = ? AND date > ? ORDER BY col1 DESC
According to this range documentation, an index will stop being used after it hits the > operator. But according to this order by documentation, an index can only be used to optimize the order by clause if it is ordering by the last column in the index. Is it possible to get an indexed lookup on every part of this query, or can you only get 2 of the 3? Can I do any better than index (id, date)?
Plan A: INDEX(id, date) -- works best if when it filters out a lot of rows, making the subsequent "filesort" not very costly.
Plan B: INDEX(col1), which may work best if very few rows are filtered by the WHERE clause. This avoids the filesort, but is not necessarily faster than the other choices here.
Plan C: INDEX(id, date, col1) -- This is a "covering" index if the query does not reference any other fields. The potential advantage here is to look only at the index, and not have to touch the data. If it applies, Plan C is better than Plan A.
You have not provided enough information to say which of these INDEXes will work best. Suggest you add C and B, if "covering" applies; else add A and B. The see which index the Optimizer picks. (There is still a chance that the Optimizer will not pick 'right'.)
(These three indexes are what my Index blog recommends.)

MySQL Fulltext Search ORDER BY column

This question MAY have been asked before, but I can't for the life of me find the answer.
In order to avoid
SELECT * FROM student WHERE name LIKE '%searchphrase%' ORDER BY score
which, as I understand it, will never use index and will always use filesort there's the ability to use FULLTEXT index.
The question: How can I order by score without a filesort if I perform a fulltext search?
Result rows will come out in whatever order they're in in the FULLTEXT index which certainly isn't the order required by ORDER BY score, so the fulltext matches need to be sorted for ORDER BY in a separate step, and this is what filesort does.
The only alternative execution plan would be to retrieve rows in score order, and then apply fulltext match row by row, which totally defies any fulltext specific optimizations.
What may make sense in your case may be to have a combined index on (score, name) and stick with LIKE if your search expression covers a large part of the student rows, in this case you'd get an index scan in score order and the LIKE expression can be evaluated on index entries. Sou you're getting a full index scan instead of a full table scan, and no extra sort is needed as index entries are ordered by score already.
But if the number of matching rows is rather small compared to the total number of rows in the table doing a fulltext index lookup first, followed by filesort, will be the better plan.

What fields should be indexed together? group by? order by?

I'm trying to speed up a query which I currently have as:
SELECT *
FROM `events`
WHERE (field1 = 'some string' or field1 = 'some string')
and is_current = true
GROUP BY event_id
ORDER BY pub_date
this takes roughly 30seconds.
field1 is a varchar(150)
I'm currently indexing
field1, is_current, event_id, pub_data
charity, pub_date, is_current
and all the fields individually...
I'm really not sure what fields should be indexed together, when I remove the order by, the query speeds up to around 8 seconds, and if I removed both the order by and group by, it's less than 1 second...
What exactly should be indexed in this case to speed up the query?
Edit:
I've run explain on the modified query (which no longer includes the group by):
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE events range is_current,field1_2,field1_3,field1_4,field1 field1_3 153 NULL 204336 Using where; Using filesort
Which indicates it's using the key field1_3 which is: field1 & is_current
Although it's not using the key which includes those two fields and pub_date (for the ordering..?)
It's also using FILESORT which seems to be the main problem..
any ideas why it's using a filesort even though the pub_date field is also indexed (with the other fields)?
Everything, (field1, is_current, event_id, pub_date) in one index. MySQL can only use one index per joined table in a query.
Use EXPLAIN to see what happens when you do.
Also, an aside - as KoolKabin says, * is rarely a good idea. Sometimes MySQL will copy the rows in a temporary table; and then there are the communication costs. The less you ask from it the faster things will work.
UPDATE: I was actually wrong. Sorry. First off, you can't get full use of indexing if your grouping is different than your ordering. Second, do you have an index where your ordering key (pub_date) is first? If not, try if that fixes the ordering thing.
any ideas why it's using a filesort even though the pub_date field is also indexed (with the other fields)?
This is because the mysql optimizer is trying to use index "field1" and you want the data ordered by pub_date. If you are using mysql 5.1 (the following query will give error in earlier versionn), you can force mysql to use the pub_date index for order by, something like this
SELECT *
FROM `events`
force index for order by (pub_date)
WHERE (field1 = 'some string' or field1 = 'some string')
and is_current = true
GROUP BY event_id
ORDER BY pub_date