MySQL query with alias not using an index - mysql

The following query is showing up in my log as not using an index:
SELECT ordinal,YEAR(ceremonydate) as yr
FROM awardinfo
ORDER BY YEAR(ceremonydate) DESC LIMIT 1;
Explain shows it's not using an index:
id: 1
select_type: SIMPLE
table: awardinfo
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 72
Extra: Using filesort
ordinal, ceremonydate both have an index. Are they not being used due to the yr alias? Is there a way to create an index on YEAR(ceremonydate) instead of just ceremonydate? Or is there a way to index an alias?

It is because of the alias. ORDER BY can use an index if it is ordering by something that is indexed. While ceremonyDate date may be indexed, YEAR(ceremoneyDate) changes the value of ceremonyDate to something completely different, so YEAR(ceremoneyDate) is not indexed.
And since you can't index an alias, this means that in order for an ORDER BY to use an index, it must be a simple column name, or list of column names.
You should be able to do this and use the index:
SELECT ordinal,YEAR(ceremonydate) as yr
FROM awardinfo
ORDER BY ceremonydate DESC LIMIT 1;
Without knowing what your data looks like, that may work for you instead.
More info:http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html

You can test this by running a simple select query and using the alias:
SELECT ordinal, ceremonydate as yr FROM ... and start adding complexity to your query to see where the indexes stop being used. Most likely, because you are ordering based on YEAR(ceremonydate) mysql is creating a temporary table. Your best bet is to process ceremonydate in your code. MySQL loses a lot of efficiency with inline processing and computation like YEAR() because it has to create those temporary tables.

Related

Why does this query take over 5 seconds to run?

I have a MySQL table with around 2m rows in it. I'm trying to run the below query and each time it's taken over 5 seconds to get results. I have an index on created_at column. Below is the EXPLAIN output.
Is this expected?
Thanks in advance.
SELECT
DATE(created_at) AS grouped_date,
HOUR(created_at) AS grouped_hour,
count(*) AS requests
FROM
`advert_requests`
WHERE
DATE(created_at) BETWEEN '2022-09-09' AND '2022-09-12'
GROUP BY
grouped_date,
grouped_hour
The EXPLAIN shows type: index which is an index-scan. That is, it is using the index, but it's iterating over every entry in the index, like a table-scan does for rows in the table. This is supported by rows: 2861816 which tells you the optimizer's estimate of quantity of index entries it will examine (this is a rough number). This is much more expensive than examining only the rows matching the condition, which is the benefit we look for from an index.
So why is this?
When you use any function on an index column in your search like this:
WHERE
DATE(created_at) BETWEEN '2022-09-09' AND '2022-09-12'
It spoils the benefit of the index for reducing the number of rows examined.
MySQL's optimizer doesn't have any intelligence about the result of functions, so it can't infer that the order of return values will be in the same order as the index. Therefore it can't use the fact that the index is sorted to narrow down the search. You and I know that this is natural for DATE(created_at) to be in the same order as created_at, but the query optimizer doesn't know this. There are other functions like MONTH(created_at) where the results are definitely not in sorted order, and MySQL's optimizer doesn't attempt to know which function's results are reliably sorted.
To fix your query, you can try one of two things:
Use an expression index. This is a new feature in MySQL 8.0:
ALTER TABLE `advert_requests` ADD INDEX ((DATE(created_at)))
Notice the extra redundant pair of parentheses. These are required when defining an expression index. The index entries are the results of that function or expression, not the original values of the column.
If you then use the same expression in your query, the optimizer recognizes that and uses the index.
mysql> explain SELECT DATE(created_at) AS grouped_date, HOUR(created_at) AS grouped_hour, count(*) AS requests FROM `advert_requests` WHERE DATE(created_at) BETWEEN '2022-09-09' AND '2022-09-12' GROUP BY grouped_date, grouped_hour\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: advert_requests
partitions: NULL
type: range <-- much better than 'index'
possible_keys: functional_index
key: functional_index
key_len: 4
ref: NULL
rows: 1
filtered: 100.00
Extra: Using where; Using temporary
If you use MySQL 5.7, you can't use expression indexes directly, but you can use a virtual column and define an index on the virtual column:
ALTER TABLE advert_requests
ADD COLUMN created_at_date DATE AS (DATE(created_at)),
ADD INDEX (created_at_date);
The trick of the optimizer recognizing the expression still works.
If you use a version of MySQL older than 5.7, you should upgrade regardless. MySQL 5.6 and older versions are past their end of life by now, and they are security risks.
The second thing you could do is refactor your query so the created_at column is not inside a function.
WHERE
created_at >= '2022-09-09' AND created_at < '2022-09-13'
When comparing a datetime to a date value, the date value is implicitly at 00:00:00.000 time. To include every fraction of a second up to 2022-09-12 23:59:59.999, it's simpler to just use < '2022-09-13'.
The EXPLAIN of this shows that it uses the existing index on created_at.
mysql> explain SELECT DATE(created_at) AS grouped_date, HOUR(created_at) AS grouped_hour, count(*) AS requests FROM `advert_requests` WHERE created_at >= '2022-09-09' AND created_at < '2022-09-13' GROUP BY grouped_date, grouped_hour\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: advert_requests
partitions: NULL
type: range <-- not 'index'
possible_keys: created_at
key: created_at
key_len: 6
ref: NULL
rows: 1
filtered: 100.00
Extra: Using index condition; Using temporary
This solution works on older versions of MySQL as well as 5.7 and 8.0.
Use explain analysis and check whether it is Index range scan or not. if not follow this link:
https://dev.mysql.com/doc/refman/8.0/en/range-optimization.html
(Note that sometimes full table scan can be better if most of the timestamps in the table belong to the selected date range. As I know in such case optimization is not trivial)
If I understand the EXPLAIN correctly, it's able to use the index to implement the WHERE filtering. But this is returning 2.8 million rows, which then have to be grouped by date and hour, and this is a slow process.
You may be able to improve it by creating virtual columns for the date and hour, and index these as well.
ALTER TABLE advert_requests
ADD COLUMN created_date DATE AS (DATE(created_at)), ADD column created_hour INT AS (HOUR(created_at)), ADD INDEX (created_date, created_hour);

How do I speed up this SQL query

I have the following query:
select min(a) from tbl where b > ?;
and it takes about 4 seconds on my mysql instance with index(b, a) (15M rows). Is there a way to speed it up?
Explain:
explain select min(parsed_id) from replays where game_date > '2016-10-01';
id: 1
select_type: SIMPLE
table: replays
partitions: NULL
type: range
possible_keys: replays_game_date_index,replays_game_date_parsed_id_index
key: replays_game_date_parsed_id_index
key_len: 6
ref: NULL
rows: 6854021
filtered: 100.00
Extra: Using where; Using index
Index statement:
create index replays_game_date_parsed_id_index on replays (game_date, parsed_id);
I think the index MySQL is using is the right one. The query should be instantaneous since a SINGLE read from the index should return the result you want. I guess for this query MySQL's SQL optimizer is doing a very poor job.
Maybe you could rephrase your query to trick the SQL optimizer onto using a different strategy. Maybe you can try:
select parsed_id
from replays
where game_date > '2016-10-01'
order by parsed_id
limit 1
Is this version any faster?
select #mina
fro (select (#mina := least(#mina, a)) as mina
from tbl cross join
(select #mina := 999999) params
where b > ?
) t
limit 1;
I suspect this won't make much difference, but I'm not sure what happens under the hood with such a large aggregation function running over an index.
This may or may not help: Change the query and add an index:
SELECT a FROM tbl WHERE b > ? ORDER BY a LIMIT 1;
INDEX(a, b)
Then, if a matching b occurs soon enough in the table, this will be faster than the other suggestions.
On the other hand, if the only matching b is near the end of the table, this will have to scan nearly all the index and be slower than the other options.
a needs to be first in the index. By having both columns in the index, it becomes a "covering" index, hence a bit faster.
It may be that using my SELECT, together with two indexes will give the Optimizer enough to pick the better approach:
INDEX(a,b)
INDEX(b,a)
Schema
Adding either (or both) composite indexes should help.
Shrinking the table size is likely to help...
INT takes 4 bytes. Consider whether a smaller datatype would suffice for any of those columns.
There are 3 dates (DATETIME, TIMESTAMP); do you need all of them?
Is fingerprint varchar(36) a UUID/GUID? If so, it could be packed into BINARY(16).
640MB is tight -- check the graphs to make sure there is no "swapping". (Swapping would be really bad for performance.)

Basic query in unexpectedly slow in MySQL

I am running a basic select on a table with 189,000 records. The table structure is:
items
id - primary key
ad_count - int, indexed
company_id - varchar, indexed
timestamps
the select query is:
select *
from `items`
where `company_id` is not null
and `ad_count` <= 100
order by `ad_count` desc, `items`.`id` asc
limit 50
On my production servers, just the MySQL portion of the execution takes 300 - 400ms
If I run an explain, I get:
select type: SIMPLE
table: items
type: range
possible_keys: items_company_id_index,items_ad_count_index
key: items_company_id_index
key_len: 403
ref: NULL
rows: 94735
Extra: Using index condition; Using where; Using filesort
When fetching this data in our application, we paginate it groups of 50, but the above query is "the first page"
I'm not too familiar with dissecting explain queries. Is there something I'm missing here?
An ORDER BY clause with different sorting order can cause the creation of temporary tables and filesort. MySQL below (and including) v5.7 doesn't handle such scenarios well at all, and there is actually no point in indexing the fields in the ORDER BY clause, as MySQL's optimizer will never use them.
Therefore, if the application's requirements allow, it's best to use the same order for all columns in the ORDER BY clause.
So in this case:
order by `ad_count` desc, `items`.`id` asc
Will become:
order by `ad_count` desc, `items`.`id` desc
P.S, as a small tip to read more about - it seems that MySQL 8.0 is going to change things and these use cases might perform significantly better when it's released.
Try replacing items_company_id_index with a multi-column index on (company_id, ad_count).
DROP INDEX items_company_id_index ON items;
CREATE INDEX items_company_id_ad_count_index ON items (company_id, ad_count);
This will allow it to use the index to test both conditions in the WHERE clause. Currently, it's using the index just to find non-null company_id, and then doing a full scan of those records to test ad_count. If most records have non-null company_id, it's scanning most of the table.
You don't need to retain the old index on just the company_id column, because a multi-column index is also an index on any prefix columns, because of the way B-trees work.
I could be wrong here (depending on your sql version this could be faster) but try a Inner Join with your company table.
Like:
Select *
From items
INNER JOIN companies ON companies.id = items.company_id
and items.ad_count <= 100
LIMIT 50;
because of your high indexcount building the btrees will slow down the database each time a new entry is inserted. Maybe remove the index of ad_count?! (this depends on how often you use that entry for queries)

MySQL Slow query for 'COUNT'

The following query takes 0.7s on a 2.5Ghz dual core Windows Server 2008 R2 Enterprise when run over a 4.5Gb MySql database. sIndex10 is a varchar(1024) column type:
SELECT COUNT(*) FROM e_entity
WHERE meta_oid=336799 AND sIndex10 = ''
An EXPLAIN shows the following information:
id: 1
select_type: SIMPLE
table: e_entity
type: ref
possible_keys: App_Parent,sIndex10
key: App_Parent
key_len: 4
ref: const
rows: 270066
extra: Using Where
There are 230060 rows matching the first condition and 124216 rows matching the clause with AND operator. meta_oid is indexed, and although sIndex10 is also indexed I think it's correctly not picking up this index as a FORCE INDEX (sIndex10), takes longer.
We have had a look at configuration parameters such as innodb_buffer_pool_size and they look correct as well.
Given that this table has already 642532 records, have we reached the top of the performance mysql can offer? Is it at this point investing in hardware the only way forward?
WHERE meta_oid=336799 AND sIndex10 = ''
begs for a composite index
INDEX(meta_oid, sIndex10) -- in either order
That is not the same as having separate indexes on the columns.
That's all there is to it.
Index Cookbook
One thing I alway do is just count(id) since id is (nearly) always indexed, counting just the id only has to look at the index.
So try running and see if it performs any better. You should also add SQL_NO_CACHE when testing to get a better idea of how the query performs.
SELECT SQL_NO_CACHE COUNT(id) FROM e_entity
WHERE meta_oid=336799 AND sIndex10 = ''
Note: This is probably not the complete answer for your question, but it was too long for just a comment.

Optimising a trivial MySQL query to get rid of filesort

I've got a trivial mysql query which isn't liked by the server really:
SELECT zone_id, SUM(inc_sec) AS voucher_used
FROM cdr_bill_2010_09
WHERE cust_id = 1234 AND voucher_id = 'XXXXXX'
GROUP BY zone_id
Now this should be really quick (and usually is) since there are indexes on both cust_id and voucher_id (voucher_id is chosen). However it still uses helper tables. After explaining:
id: 1
select_type: SIMPLE
table: cdr_bill_2010_09
type: ref
possible_keys: cust_id,voucher_id
key: voucher_id
key_len: 9
ref: const
rows: 1
Extra: Using where; Using temporary; Using filesort
Can I do something specific to get rid of those? I'm running debian's 5.0.45.
just try to add another index (according to http://forums.mysql.com/read.php?115,57443,59562):
CREATE INDEX index_zi ON cdr_bill_2010_09 (zone_id,inc_sec);
Are you indexing all the bytes in the voucher_id?
from http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html :
You index only a prefix of a column
named in the ORDER BY clause. In this
case, the index cannot be used to
fully resolve the sort order. For
example, if you have a CHAR(20)
column, but index only the first 10
bytes, the index cannot distinguish
values past the 10th byte and a
filesort will be needed.
I know it's not an order by clause, but maybe the same deal?
Have you tried creating an index with both cust_id and vendor_id in it?
create index idx_cdr_bill_2010_09_joint on cdr_bill_2010_09(cust_id, vendor_id)
The actual answer was close to the other ones. Adding an index with both "where" key (voucher_id) or (cust_id) and the "group" key (zone_id) fixed the problem. Actually zone_id itself was sufficient to get rid of additional temporary / filesort, but adding another one was needed to use the speed up the basic query.
CREATE INDEX IDX_CDRBILL_CUST_VOUCH_ZONE ON cdr_bill_2010_09 (cust_id,
voucher_id, zone_id);