Why does this query take over 5 seconds to run? - mysql

I have a MySQL table with around 2m rows in it. I'm trying to run the below query and each time it's taken over 5 seconds to get results. I have an index on created_at column. Below is the EXPLAIN output.
Is this expected?
Thanks in advance.
SELECT
DATE(created_at) AS grouped_date,
HOUR(created_at) AS grouped_hour,
count(*) AS requests
FROM
`advert_requests`
WHERE
DATE(created_at) BETWEEN '2022-09-09' AND '2022-09-12'
GROUP BY
grouped_date,
grouped_hour

The EXPLAIN shows type: index which is an index-scan. That is, it is using the index, but it's iterating over every entry in the index, like a table-scan does for rows in the table. This is supported by rows: 2861816 which tells you the optimizer's estimate of quantity of index entries it will examine (this is a rough number). This is much more expensive than examining only the rows matching the condition, which is the benefit we look for from an index.
So why is this?
When you use any function on an index column in your search like this:
WHERE
DATE(created_at) BETWEEN '2022-09-09' AND '2022-09-12'
It spoils the benefit of the index for reducing the number of rows examined.
MySQL's optimizer doesn't have any intelligence about the result of functions, so it can't infer that the order of return values will be in the same order as the index. Therefore it can't use the fact that the index is sorted to narrow down the search. You and I know that this is natural for DATE(created_at) to be in the same order as created_at, but the query optimizer doesn't know this. There are other functions like MONTH(created_at) where the results are definitely not in sorted order, and MySQL's optimizer doesn't attempt to know which function's results are reliably sorted.
To fix your query, you can try one of two things:
Use an expression index. This is a new feature in MySQL 8.0:
ALTER TABLE `advert_requests` ADD INDEX ((DATE(created_at)))
Notice the extra redundant pair of parentheses. These are required when defining an expression index. The index entries are the results of that function or expression, not the original values of the column.
If you then use the same expression in your query, the optimizer recognizes that and uses the index.
mysql> explain SELECT DATE(created_at) AS grouped_date, HOUR(created_at) AS grouped_hour, count(*) AS requests FROM `advert_requests` WHERE DATE(created_at) BETWEEN '2022-09-09' AND '2022-09-12' GROUP BY grouped_date, grouped_hour\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: advert_requests
partitions: NULL
type: range <-- much better than 'index'
possible_keys: functional_index
key: functional_index
key_len: 4
ref: NULL
rows: 1
filtered: 100.00
Extra: Using where; Using temporary
If you use MySQL 5.7, you can't use expression indexes directly, but you can use a virtual column and define an index on the virtual column:
ALTER TABLE advert_requests
ADD COLUMN created_at_date DATE AS (DATE(created_at)),
ADD INDEX (created_at_date);
The trick of the optimizer recognizing the expression still works.
If you use a version of MySQL older than 5.7, you should upgrade regardless. MySQL 5.6 and older versions are past their end of life by now, and they are security risks.
The second thing you could do is refactor your query so the created_at column is not inside a function.
WHERE
created_at >= '2022-09-09' AND created_at < '2022-09-13'
When comparing a datetime to a date value, the date value is implicitly at 00:00:00.000 time. To include every fraction of a second up to 2022-09-12 23:59:59.999, it's simpler to just use < '2022-09-13'.
The EXPLAIN of this shows that it uses the existing index on created_at.
mysql> explain SELECT DATE(created_at) AS grouped_date, HOUR(created_at) AS grouped_hour, count(*) AS requests FROM `advert_requests` WHERE created_at >= '2022-09-09' AND created_at < '2022-09-13' GROUP BY grouped_date, grouped_hour\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: advert_requests
partitions: NULL
type: range <-- not 'index'
possible_keys: created_at
key: created_at
key_len: 6
ref: NULL
rows: 1
filtered: 100.00
Extra: Using index condition; Using temporary
This solution works on older versions of MySQL as well as 5.7 and 8.0.

Use explain analysis and check whether it is Index range scan or not. if not follow this link:
https://dev.mysql.com/doc/refman/8.0/en/range-optimization.html
(Note that sometimes full table scan can be better if most of the timestamps in the table belong to the selected date range. As I know in such case optimization is not trivial)

If I understand the EXPLAIN correctly, it's able to use the index to implement the WHERE filtering. But this is returning 2.8 million rows, which then have to be grouped by date and hour, and this is a slow process.
You may be able to improve it by creating virtual columns for the date and hour, and index these as well.
ALTER TABLE advert_requests
ADD COLUMN created_date DATE AS (DATE(created_at)), ADD column created_hour INT AS (HOUR(created_at)), ADD INDEX (created_date, created_hour);

Related

Should using USE/FORCE INDEX change the EXPLAIN output in a MySQL query?

As title suggests, should the EXPLAIN output change after explicitly using FORCE INDEX (index_1, index_2) in a query?
As an example I have the following query:
select
person_id,
role_id,
scope_id,
count(distinct qualification_id) as ncomps
from dw_rolepersonqualification
where ((mandatory = 'y') and (expiry_date > now()))
group by 1, 2, 3
When I run it with EXPLAIN, I get:
id: 1
select_type: SIMPLE
table: dw_rolepersonqualification
type: ALL
possible_keys: PRIMARY, idx_person, idx_role, idx_qualification, idx_scope, idx_mandatory
key: null
key_len: null
ref: null
rows: 8267852
Extra: Using where; Using filesort
When I add in FORCE INDEX (dx_person, idx_role, idx_qualification, idx_scope) it does not change the output of EXPLAIN. Is this to be expected or am I missing something?
The optimal index for that query is
INDEX(mandatory, -- tested with "=", so first
expiry_date) -- a range
Even so, it may decide that the index is not worth the effort. If the Optimizer estimates that more than ~20% of the table matches the WHERE clause, it will decide to scan the table rather than bouncing between the index's BTree and the data BTree.
A "covering" index may (or may not) be better:
INDEX(mandatory, -- tested with "=", so first
expiry_date, -- a range
person_id, role_id, scope_id,
qualification_id) -- all other touched columns (any order)
(Caveat: Some of what I say may not be valid; please provide SHOW CREATE TABLE.)
My Mantra: "Force Index may help today, but hurt tomorrow."

How do I speed up this SQL query

I have the following query:
select min(a) from tbl where b > ?;
and it takes about 4 seconds on my mysql instance with index(b, a) (15M rows). Is there a way to speed it up?
Explain:
explain select min(parsed_id) from replays where game_date > '2016-10-01';
id: 1
select_type: SIMPLE
table: replays
partitions: NULL
type: range
possible_keys: replays_game_date_index,replays_game_date_parsed_id_index
key: replays_game_date_parsed_id_index
key_len: 6
ref: NULL
rows: 6854021
filtered: 100.00
Extra: Using where; Using index
Index statement:
create index replays_game_date_parsed_id_index on replays (game_date, parsed_id);
I think the index MySQL is using is the right one. The query should be instantaneous since a SINGLE read from the index should return the result you want. I guess for this query MySQL's SQL optimizer is doing a very poor job.
Maybe you could rephrase your query to trick the SQL optimizer onto using a different strategy. Maybe you can try:
select parsed_id
from replays
where game_date > '2016-10-01'
order by parsed_id
limit 1
Is this version any faster?
select #mina
fro (select (#mina := least(#mina, a)) as mina
from tbl cross join
(select #mina := 999999) params
where b > ?
) t
limit 1;
I suspect this won't make much difference, but I'm not sure what happens under the hood with such a large aggregation function running over an index.
This may or may not help: Change the query and add an index:
SELECT a FROM tbl WHERE b > ? ORDER BY a LIMIT 1;
INDEX(a, b)
Then, if a matching b occurs soon enough in the table, this will be faster than the other suggestions.
On the other hand, if the only matching b is near the end of the table, this will have to scan nearly all the index and be slower than the other options.
a needs to be first in the index. By having both columns in the index, it becomes a "covering" index, hence a bit faster.
It may be that using my SELECT, together with two indexes will give the Optimizer enough to pick the better approach:
INDEX(a,b)
INDEX(b,a)
Schema
Adding either (or both) composite indexes should help.
Shrinking the table size is likely to help...
INT takes 4 bytes. Consider whether a smaller datatype would suffice for any of those columns.
There are 3 dates (DATETIME, TIMESTAMP); do you need all of them?
Is fingerprint varchar(36) a UUID/GUID? If so, it could be packed into BINARY(16).
640MB is tight -- check the graphs to make sure there is no "swapping". (Swapping would be really bad for performance.)

MySQL query with alias not using an index

The following query is showing up in my log as not using an index:
SELECT ordinal,YEAR(ceremonydate) as yr
FROM awardinfo
ORDER BY YEAR(ceremonydate) DESC LIMIT 1;
Explain shows it's not using an index:
id: 1
select_type: SIMPLE
table: awardinfo
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 72
Extra: Using filesort
ordinal, ceremonydate both have an index. Are they not being used due to the yr alias? Is there a way to create an index on YEAR(ceremonydate) instead of just ceremonydate? Or is there a way to index an alias?
It is because of the alias. ORDER BY can use an index if it is ordering by something that is indexed. While ceremonyDate date may be indexed, YEAR(ceremoneyDate) changes the value of ceremonyDate to something completely different, so YEAR(ceremoneyDate) is not indexed.
And since you can't index an alias, this means that in order for an ORDER BY to use an index, it must be a simple column name, or list of column names.
You should be able to do this and use the index:
SELECT ordinal,YEAR(ceremonydate) as yr
FROM awardinfo
ORDER BY ceremonydate DESC LIMIT 1;
Without knowing what your data looks like, that may work for you instead.
More info:http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html
You can test this by running a simple select query and using the alias:
SELECT ordinal, ceremonydate as yr FROM ... and start adding complexity to your query to see where the indexes stop being used. Most likely, because you are ordering based on YEAR(ceremonydate) mysql is creating a temporary table. Your best bet is to process ceremonydate in your code. MySQL loses a lot of efficiency with inline processing and computation like YEAR() because it has to create those temporary tables.

Why is this date range query so slow?

I have a database table with 5 million rows, I am running:
select
*
from
tbl
where
datetime_created
between
'2014-10-01 00:00:00' and
'2014-10-31 23:59:59'
It took 54 seconds to return 428k results
The columns on the tbl:
id (int pk auto inc)
actor (varchar)
action (enum)
target (varchar)
is_successful (tinyint)
datetime_created (datetime)
The index:
datetime_created (datetime_created, action, target, is_successful)
Any ideas on how I can improve this?
edit:
EXPLAIN results:
select_type: simple
type: range
possible keys
datetime_created
key: datetime_created
key_len: 8
ref: null
rows: 359569
extra: using index condition
428k is a lot of rows to work with in one shot . Even though you have an index on date, the engine still has to scan through the table between the high and low values. I would suggest multiple queries reading the data in smaller chunks and narrowing result set if possible.
E.g. Try adding action enum filter together with the date range should yield much faster results. Say there are 5 enum types then you run 5 queries for each action enum. The more indexed criteria you add the better the query will perform .
Also consider if this is going to be used in an app, that is a massive recordset to deal with. Do you really need to work with 428k results at a time?

MySQL: Not using index for ORDER BY?

I've been trying and googling everything and still can't figure out what's going on.
I have a big table (100M+rows). Among others it has 3 columns: user_id, date, type.
It has an index idx(user_id, type, date).
When I EXPLAIN this query:
SELECT *
FROM table
WHERE user_id = 12345
AND type = 'X'
ORDER BY date DESC
LIMIT 5
EXPLAIN shows that MySQL examined 110K rows. which is roughly row many rows this user_id has.
My question is:
Why the same index is not used for ORDER_BY LIMIT 5? It knows which rows belong to the user_id, date is part of the same index, so why not just take last 5 rows in that index?
P.S. I tried index by (user_id, date, type) - same results; i tried removing DESC - same results.
This is the EXPLAIN plan:
id: 1
select_type: SIMPLE
table: s
type: ref
possible_keys: dateIdx,userTypeDateIdx
key: userTypeDateIdx
key_len: 5
ref: const,const
rows: 110118
Extra: Using where
I also tried adding FORCE INDEX FOR ORDER BY hint, but i still get rows: 110118.
Did you ANALYZE TABLE after creating the index?
Mysql will not use the index until the table is analyzed. The best index to use is the one you created with (user_id, type, date)
The date in the index is in ascending order, and you are asking for the most recent five rows in descending order by date; it can't use the index for that. If you changed the index to user_id, type, date desc it would be able to use the index to get the most recent five rows.