mysql using filesort instead of index - mysql

I have this query:
SELECT * FROM table WHERE x >= 500 AND x < 5000 ORDER BY date DESC LIMIT 0,50
I have index: x, date - Btree
Why is this query using index and filesort, if I have index on both values.
x= integer
date = date
tyble type = myisam
explain:
ID: 1
select_type: SIMPLE
table: d
type: range
possible_keys: sort
key: sort
key_len: 2
ref: null
rows: 198
extra: using index condition; using filesort

The query is using filesort because it is a range query. Filesort would desappear if the query used exact equation.
But you probably know that filesort is actualy a misname and has actually no relation to files.

From the reference -
In some cases, MySQL cannot use indexes to resolve the ORDER BY,
although it still uses indexes to find the rows that match the WHERE
clause. These cases include the following:
The key used to fetch the rows is not the same as the one used in the
ORDER BY: SELECT * FROM t1 WHERE key2=constant ORDER BY key1;
Try to add index INDEX (date, x).

Related

Why does this query take over 5 seconds to run?

I have a MySQL table with around 2m rows in it. I'm trying to run the below query and each time it's taken over 5 seconds to get results. I have an index on created_at column. Below is the EXPLAIN output.
Is this expected?
Thanks in advance.
SELECT
DATE(created_at) AS grouped_date,
HOUR(created_at) AS grouped_hour,
count(*) AS requests
FROM
`advert_requests`
WHERE
DATE(created_at) BETWEEN '2022-09-09' AND '2022-09-12'
GROUP BY
grouped_date,
grouped_hour
The EXPLAIN shows type: index which is an index-scan. That is, it is using the index, but it's iterating over every entry in the index, like a table-scan does for rows in the table. This is supported by rows: 2861816 which tells you the optimizer's estimate of quantity of index entries it will examine (this is a rough number). This is much more expensive than examining only the rows matching the condition, which is the benefit we look for from an index.
So why is this?
When you use any function on an index column in your search like this:
WHERE
DATE(created_at) BETWEEN '2022-09-09' AND '2022-09-12'
It spoils the benefit of the index for reducing the number of rows examined.
MySQL's optimizer doesn't have any intelligence about the result of functions, so it can't infer that the order of return values will be in the same order as the index. Therefore it can't use the fact that the index is sorted to narrow down the search. You and I know that this is natural for DATE(created_at) to be in the same order as created_at, but the query optimizer doesn't know this. There are other functions like MONTH(created_at) where the results are definitely not in sorted order, and MySQL's optimizer doesn't attempt to know which function's results are reliably sorted.
To fix your query, you can try one of two things:
Use an expression index. This is a new feature in MySQL 8.0:
ALTER TABLE `advert_requests` ADD INDEX ((DATE(created_at)))
Notice the extra redundant pair of parentheses. These are required when defining an expression index. The index entries are the results of that function or expression, not the original values of the column.
If you then use the same expression in your query, the optimizer recognizes that and uses the index.
mysql> explain SELECT DATE(created_at) AS grouped_date, HOUR(created_at) AS grouped_hour, count(*) AS requests FROM `advert_requests` WHERE DATE(created_at) BETWEEN '2022-09-09' AND '2022-09-12' GROUP BY grouped_date, grouped_hour\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: advert_requests
partitions: NULL
type: range <-- much better than 'index'
possible_keys: functional_index
key: functional_index
key_len: 4
ref: NULL
rows: 1
filtered: 100.00
Extra: Using where; Using temporary
If you use MySQL 5.7, you can't use expression indexes directly, but you can use a virtual column and define an index on the virtual column:
ALTER TABLE advert_requests
ADD COLUMN created_at_date DATE AS (DATE(created_at)),
ADD INDEX (created_at_date);
The trick of the optimizer recognizing the expression still works.
If you use a version of MySQL older than 5.7, you should upgrade regardless. MySQL 5.6 and older versions are past their end of life by now, and they are security risks.
The second thing you could do is refactor your query so the created_at column is not inside a function.
WHERE
created_at >= '2022-09-09' AND created_at < '2022-09-13'
When comparing a datetime to a date value, the date value is implicitly at 00:00:00.000 time. To include every fraction of a second up to 2022-09-12 23:59:59.999, it's simpler to just use < '2022-09-13'.
The EXPLAIN of this shows that it uses the existing index on created_at.
mysql> explain SELECT DATE(created_at) AS grouped_date, HOUR(created_at) AS grouped_hour, count(*) AS requests FROM `advert_requests` WHERE created_at >= '2022-09-09' AND created_at < '2022-09-13' GROUP BY grouped_date, grouped_hour\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: advert_requests
partitions: NULL
type: range <-- not 'index'
possible_keys: created_at
key: created_at
key_len: 6
ref: NULL
rows: 1
filtered: 100.00
Extra: Using index condition; Using temporary
This solution works on older versions of MySQL as well as 5.7 and 8.0.
Use explain analysis and check whether it is Index range scan or not. if not follow this link:
https://dev.mysql.com/doc/refman/8.0/en/range-optimization.html
(Note that sometimes full table scan can be better if most of the timestamps in the table belong to the selected date range. As I know in such case optimization is not trivial)
If I understand the EXPLAIN correctly, it's able to use the index to implement the WHERE filtering. But this is returning 2.8 million rows, which then have to be grouped by date and hour, and this is a slow process.
You may be able to improve it by creating virtual columns for the date and hour, and index these as well.
ALTER TABLE advert_requests
ADD COLUMN created_date DATE AS (DATE(created_at)), ADD column created_hour INT AS (HOUR(created_at)), ADD INDEX (created_date, created_hour);

What is compound indexing and how do I use it properly?

I have a really slow query that repeats itself quite a bit. I've tried indexing the individual fields but it doesn't seem to help. The CPU usage is still very high and the queries still appear on the slow query log. It seems I need a compound index?
How would I index the following query properly?
select *
from `to_attachments` left join
`attachments`
on `to_attachments`.`attachment_id` = `attachments`.`id`
where `to_attachments`.`object_type` = 'communicator' and `to_attachments`.`object_id` = '64328'
order by `attachments`.`created_at` desc;
EXPLAIN Result:
1 SIMPLE to_attachments index NULL PRIMARY 775 NULL 244384 Using where; Using index; Using temporary; Using filesort
1 SIMPLE attachments eq_ref PRIMARY PRIMARY 4 quote.to_attachments.attachment_id 1 NULL
Index For to_attachments
You want indexes on to_attachments(object_type, object_id, attachment_id) and attachments(id).
You sequence of the index is wrong it should be (object_type, object_id, attachment_id). In the multicolumn index order of the columns in the index is MATTER.

MySQL query with alias not using an index

The following query is showing up in my log as not using an index:
SELECT ordinal,YEAR(ceremonydate) as yr
FROM awardinfo
ORDER BY YEAR(ceremonydate) DESC LIMIT 1;
Explain shows it's not using an index:
id: 1
select_type: SIMPLE
table: awardinfo
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 72
Extra: Using filesort
ordinal, ceremonydate both have an index. Are they not being used due to the yr alias? Is there a way to create an index on YEAR(ceremonydate) instead of just ceremonydate? Or is there a way to index an alias?
It is because of the alias. ORDER BY can use an index if it is ordering by something that is indexed. While ceremonyDate date may be indexed, YEAR(ceremoneyDate) changes the value of ceremonyDate to something completely different, so YEAR(ceremoneyDate) is not indexed.
And since you can't index an alias, this means that in order for an ORDER BY to use an index, it must be a simple column name, or list of column names.
You should be able to do this and use the index:
SELECT ordinal,YEAR(ceremonydate) as yr
FROM awardinfo
ORDER BY ceremonydate DESC LIMIT 1;
Without knowing what your data looks like, that may work for you instead.
More info:http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html
You can test this by running a simple select query and using the alias:
SELECT ordinal, ceremonydate as yr FROM ... and start adding complexity to your query to see where the indexes stop being used. Most likely, because you are ordering based on YEAR(ceremonydate) mysql is creating a temporary table. Your best bet is to process ceremonydate in your code. MySQL loses a lot of efficiency with inline processing and computation like YEAR() because it has to create those temporary tables.

MySQL: Not using index for ORDER BY?

I've been trying and googling everything and still can't figure out what's going on.
I have a big table (100M+rows). Among others it has 3 columns: user_id, date, type.
It has an index idx(user_id, type, date).
When I EXPLAIN this query:
SELECT *
FROM table
WHERE user_id = 12345
AND type = 'X'
ORDER BY date DESC
LIMIT 5
EXPLAIN shows that MySQL examined 110K rows. which is roughly row many rows this user_id has.
My question is:
Why the same index is not used for ORDER_BY LIMIT 5? It knows which rows belong to the user_id, date is part of the same index, so why not just take last 5 rows in that index?
P.S. I tried index by (user_id, date, type) - same results; i tried removing DESC - same results.
This is the EXPLAIN plan:
id: 1
select_type: SIMPLE
table: s
type: ref
possible_keys: dateIdx,userTypeDateIdx
key: userTypeDateIdx
key_len: 5
ref: const,const
rows: 110118
Extra: Using where
I also tried adding FORCE INDEX FOR ORDER BY hint, but i still get rows: 110118.
Did you ANALYZE TABLE after creating the index?
Mysql will not use the index until the table is analyzed. The best index to use is the one you created with (user_id, type, date)
The date in the index is in ascending order, and you are asking for the most recent five rows in descending order by date; it can't use the index for that. If you changed the index to user_id, type, date desc it would be able to use the index to get the most recent five rows.

Optimising a trivial MySQL query to get rid of filesort

I've got a trivial mysql query which isn't liked by the server really:
SELECT zone_id, SUM(inc_sec) AS voucher_used
FROM cdr_bill_2010_09
WHERE cust_id = 1234 AND voucher_id = 'XXXXXX'
GROUP BY zone_id
Now this should be really quick (and usually is) since there are indexes on both cust_id and voucher_id (voucher_id is chosen). However it still uses helper tables. After explaining:
id: 1
select_type: SIMPLE
table: cdr_bill_2010_09
type: ref
possible_keys: cust_id,voucher_id
key: voucher_id
key_len: 9
ref: const
rows: 1
Extra: Using where; Using temporary; Using filesort
Can I do something specific to get rid of those? I'm running debian's 5.0.45.
just try to add another index (according to http://forums.mysql.com/read.php?115,57443,59562):
CREATE INDEX index_zi ON cdr_bill_2010_09 (zone_id,inc_sec);
Are you indexing all the bytes in the voucher_id?
from http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html :
You index only a prefix of a column
named in the ORDER BY clause. In this
case, the index cannot be used to
fully resolve the sort order. For
example, if you have a CHAR(20)
column, but index only the first 10
bytes, the index cannot distinguish
values past the 10th byte and a
filesort will be needed.
I know it's not an order by clause, but maybe the same deal?
Have you tried creating an index with both cust_id and vendor_id in it?
create index idx_cdr_bill_2010_09_joint on cdr_bill_2010_09(cust_id, vendor_id)
The actual answer was close to the other ones. Adding an index with both "where" key (voucher_id) or (cust_id) and the "group" key (zone_id) fixed the problem. Actually zone_id itself was sufficient to get rid of additional temporary / filesort, but adding another one was needed to use the speed up the basic query.
CREATE INDEX IDX_CDRBILL_CUST_VOUCH_ZONE ON cdr_bill_2010_09 (cust_id,
voucher_id, zone_id);