Query is too slow, and not using index - mysql

here is the "explain" of my query:
explain
select eil.sell_fmt, count(sell_fmt) as itemCount
from table_items eil
where eil.cl_Id=123 and eil.si_Id='0'
and start_date <= now() and end_date is not null and end_date < NOW()
group by eil.sell_fmt
without date (start_date, end_date) filters:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE eil ref table_items_clid_siid_sellFmt 39 const,const 7393 Using where; Using index
With date filters:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE eil ref table_items_clid_siid_sellFmt 39 const,const 8400 Using where
possible_keys are:
table_items_clid_siid, table_items_clid_siid_itemId, table_items_clid_siid_startDate_endDate, table_items_clid_siid_sellFmt
The query without date filters is very fast (0.4 sec), but with date filters, its taking about 30 seconds. total records are 14K only.
Table field types:
`cl_Id` int(11) NOT NULL,
`si_Id` varchar(11) NOT NULL,
`start_date` datetime DEFAULT NULL,
`end_date` datetime DEFAULT NULL,
`sell_fmt` varchar(20) DEFAULT NULL
I concatenated field-names to give index names, so you can estimate combined fields available in the index.
Can somebody guide me here? what's going on here? what is the best course of action i should take here, or where i'm doing wrong?
I need one more suggestion: in another query on same table, a user can filter based on UPTO 10 fields, and in no definite order of fields (random no of fields in random order). Then this type search would be too slow again. What's the best strategy then? one covering index with "all" possible searchable fields? if yes, does the order of fields in index matter? (i.e. if that order is different than that of fields in query, will the index be used?

First, without seeing your create table statement, I can offer the following... create composite index (multiple fields) that best match your common querying elements applicable to the where clause, starting with the smaller nominal count basis. Since you are explicitly looking for a "cl_ID" and "si_ID" plus start and end dates. Since you have a group by, I would add that to the index for optimization purposes and be a completely COVERING index so the engine does not need to go back to the raw data to complete the query. It can resolve by all fields in the index directly.
I would have an index on
( cl_id, si_id, start_date, end_date, sell_fmt )
Finally, change your count from count(sell_fmt) to just count(*) indicating an "i don't care about a specific field, just as long as a record is found, count it"

Related

Avoid filesort in simple filtered ordered query

I have a simple table:
CREATE TABLE `user_values` (
`id` bigint NOT NULL AUTO_INCREMENT,
`user_id` bigint NOT NULL,
`value` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`,`id`),
KEY `id` (`id`,`user_id`);
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
that I am trying to execute the following simple query:
select * from user_values where user_id in (20020, 20030) order by id desc;
I would fully expect this query to 100% use an index (either the (user_id, id) one or the (id, user_id) one) Yet, it turns out that's not the case:
explain select * from user_values where user_id in (20020, 20030); yields:
id
select_type
table
partitions
type
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
user_values
NULL
range
user_id
8
NULL
9
100.00
Using index condition; Using filesort
Why is that the case? How can I avoid a filesort on this trivial query?
You can't avoid the filesort in the query you show.
When you use a range predicate (for example, IN ( ) is a range predicate), and an index is used, the rows are read in index order. But there's no way for the MySQL query optimizer to guess that reading the rows in index order by user_id will guarantee they are also in id order. The two user_id values you are searching for are potentially scattered all over the table, in any order. Therefore MySQL must assume that once the matching rows are read, an extra step of sorting the result by id is necessary.
Here's an example of hypothetical data in which reading the rows by an index on user_id will not be in id order.
id
user_id
1
20030
2
20020
3
20016
4
20030
5
20020
So when reading from an index on (user_id, id), the matching rows will be returned in the following order, sorted by user_id first, then by id:
id
user_id
2
20020
5
20020
1
20030
4
20030
Clearly, the result is not in id order, so it needs to be sorted to satisfy the ORDER BY you requested.
The same kind of effect happens for other type of predicates, for example BETWEEN, or < or != or IS NOT NULL, etc. Every predicate except for = is a range predicate.
The only ways to avoid the filesort are to change the query in one of the following ways:
Omit the ORDER BY clause and accepting the results in whatever order the optimizer chooses to return them, which could be in id order, but only by coincidence.
Change the user_id IN (20020, 20030) to user_id = 20020, so there is only one matching user_id, and therefore reading the matching rows from the index will already be returned in the id order, and therefore the ORDER BY is a no-op. The optimizer recognizes when this is possible, and skips the filesort.
MySQL will most likely use index for the query (unless the user_id's in the query covers most of the rows).
The "filesort" happens in memory (it's really not a filesort), and is used to sort the found rows based on the ORDER BY clause.
You cannot avoid a "sort" in this case.
There were about 9 rows to sort, so it could not have taken long.
How long did the query take? Probably only a few milliseconds, so who cares?
"Filesort" does not necessarily mean that a "file" was involved. In many queries the sort is done in RAM.
Do you use id for anything other than to have a PRIMARY KEY on the table? If not, then this will help a small amount. (The speed-up won't be indicated in EXPLAIN.)
PRIMARY KEY (`user_id`,`id`), -- to avoid secondary lookups
KEY `id` (`id`); -- to keep auto_increment happy

MySQL Range Partitioning DATE() Not Working

I'm working on the optimization of MySQL query these days, one of the issues I've encountered is DATE() maybe not working for the table partitioned by date range.
Here is the sample table:
CREATE TABLE `testing_db` (
`date_time` date NOT NULL,
`id` varchar(10) NOT NULL,
PRIMARY KEY (`date_time`,`id`) USING BTREE,
UNIQUE KEY `unique` (`date_time`,`id`),
KEY `idx_date_time` (`date_time`),
KEY `idx_id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
/*!50100 PARTITION BY RANGE (to_days(`date_time`))
(PARTITION p0 VALUES LESS THAN (TO_DAYS('2021-01-01')),
PARTITION p2021_01 VALUES LESS THAN (TO_DAYS('2021-01-31')),
PARTITION p2021_02 VALUES LESS THAN (TO_DAYS('2021-02-28')),
PARTITION future VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */;
Statement with DATE():
EXPLAIN
SELECT date_time, id FROM testing_db WHERE date_time = '2021-02-25';
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE testing_db p2021_02 ref PRIMARY,unique,idx_date_time,idx_id PRIMARY 3 const 1 100.00 Using index
Statement without DATE():
EXPLAIN
SELECT date_time, id FROM testing_db WHERE DATE(date_time) = '2021-02-25';
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE testing_db p0,p2021_01,p2021_02,future index idx_date_time 3 1 100.00 Using where; Using index
Comparing two explanations, obviously the statement with DATE() scans all partitions while the statement without DATE() doesn't. Its impact may be significant in a large table.
I've researched similar issues, but it seems they are not relevant to this case:
Official Doc, DATE() extracts the date part of the date or datetime.
Mysql, partitioning not working on date range
https://bugs.mysql.com/bug.php?id=28928
Could you help figure it out? Thanks a lot!
The use of the DATE() function in your WHERE clause negates the use of any relevant index which causes a full table scan. The full table scan will need to read from all partitions.
In your example you are applying the DATE() function to a column of type DATE, so it serves no purpose.
INDEX(date_time) is unnecessary because there are two other indexes starting with that column.
A PRIMARY KEY is (in MySQL) a UNIQUE key. So your UNIQUE(datetime, id) is redundant.
Usually is is unwise to start any index with the partition key (date_time).
WHERE DATE(date_time) = ... is not "sargable". That is, no indexing of date_time can be used when hiding a column in a function (DATE()). (This is the main problem that you are asking about.)
Instead of using DATE(), use a range, such as:
WHERE date_time >= '2021-02-26'
AND date_time < '2021-02-26' + INTERVAL 1 DAY
Based on the above comments, plus other things, just these two indexes would be better:
PRIMARY KEY(id, date_time),
INDEX(date_time, id)
Please don't call it date_time when it is only a DATE. My comments work for either datatype. The DATE() function is never needed around a column of datatype DATE nor a string that looks like a date.
Your partition definitions put the last day of each month in the 'wrong' partition'.
Be aware that PARTITIONing rarely helps with performance. I discuss that further in Pagination

MYSQL Array Variable (No store prodecure, No temporarily table)

As mention in the title, I would like to know any solution for this by not using store prodecure, temporarily table etc.
Compare Query#1 and Query#3, Query#3 get worst performance result. Does it have any workaround to put in variable but without impact the performance result.
Schema (MySQL v5.7)
create table `order`(
id BIGINT(20) not null auto_increment,
transaction_no varchar(20) not null,
primary key (`id`),
unique key uk_transaction_no (`transaction_no`)
);
insert into `order` (`id`,`transaction_no`)
value (1,'10001'),(2,'10002'),(3,'10003'),(4,'10004');
Query #1
explain select * from `order` where transaction_no in ('10001','10004');
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
order
range
uk_transaction_no
uk_transaction_no
22
2
100
Using where; Using index
Query #2
set #transactionNos = "10001,10004";
There are no results to be displayed.
Query #3
explain select * from `order` where find_in_set(transaction_no, #transactionNos);
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
order
index
uk_transaction_no
22
4
100
Using where; Using index
Short Answer: See Sargeability
Long Answer:
MySQL makes no attempt to optimize an expression when an indexed column when it is hidden inside a function call such as FIND_IN_SET(), DATE(), etc. Your Query 3 will always be performed as a full table scan or a full index scan.
So the only way to optimize what you are doing is to construct
IN ('10001','10004')
That is often difficult to achieve when "binding" a list of values. IN(?) will not work in most APIs to MySQL.

How can I speed up a slow SUM + ORDER BY + LIMIT query in MySQL?

I have a simple key-value table with two fields, created like so:
CREATE TABLE `mytable` (
`key` varchar(255) NOT NULL,
`value` double NOT NULL,
KEY `MYKEY` (`key`)
);
The keys are not unique. The table contains over one million records. I need a query that will sum up all the values for a given key, and return the top 10 keys. Here's my attempt:
SELECT t.key, SUM(t.value) value
FROM mytable t
GROUP BY t.key
ORDER BY value DESC
LIMIT 0, 10;
But this is very slow. Thing is, without the GROUP BY and SUM, it's pretty fast, and without the ORDER BY, it's very fast, but for some reason the combination of the two makes it very very slow. Can anyone explain why this is so, and how this can be speeded up?
There is no index on value. I tried creating one but it didn't help.
EXPLAIN EXTENDED produces the following in Workbench:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE t index NULL MYKEY 257 NULL 1340532 100.00 "Using temporary; Using filesort"
There are about 400K unique keys in the table.
The query takes over 3 minutes to run. I don't know how long because I stopped it after 3 minutes. However, if I remove the index on key, it runs in 30 seconds! Anyone has any idea why?
The only way to really speed this up, as far as I can see, is to create a seperate table with unique keys in and maintain the total values. Then you will be able to index values to retrieve the top ten quickly, also the calculation will already be done. As long as the table is not updated in too many places, this shouldn't be a major problem.
The major problem with this type of query is that the group by requires indexing in one order and the order by requires sorting into a different order.

MySQL Index questions, simple query takes too long for a relatively small amount of indexes rows on a medium-large table

I have a relatively large table (5,208,387 rows, 400mb data/670mb index),
all columns i use to search with are indexes.
name and type are VARCHAR(255) BTREE INDEX
and sdate is an INTEGER column containing timestamps.
I fail to understand some issues,
first this query is very slow (5sec):
SELECT *
FROM `mytable`
WHERE `name` LIKE 'hello%my%big%text%thing%'
AND `type` LIKE '%'
ORDER BY `sdate` DESC LIMIT 3
EXPLAIN for the above:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE mytable range name name 257 NULL 5191 Using where
while this one is very fast (5msec):
SELECT *
FROM `mytable`
WHERE `name` LIKE 'hello.my%big%text%thing%'
AND `type` LIKE '%'
ORDER BY `sdate` DESC LIMIT 3
EXPLAIN for the above:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE mytable range name name 257 NULL 204 Using where
the amount of rows scanned different makes sense because of the indexes,
but having 5k of indexed rows take 5 seconds seems way too much.
also, ordering by name instead of sdate makes the queries very fast, but I need to order by the timestamp.
Second thing I do not understand is that before
adding the last column to the index,
the db had index of 1.4GB,
not after running an OPTIMIZE/REPAIR the size is just 670MB.
The problem is, only the portion before the first % can take advantage of the index, the rest of the like strings needs to process all rows which match hello% or hello.my% without the help of one. Also, ordering by another column then the index used, probably requires a second pass, or at least a scan rather then an already sorted index. Options to better performance (can be implemented independently from each other) are:
Using a full-text index on the name column and using a MATCH() AGAINST() search rather then LIKE with %'s.
Adding the sdate to in index combined (name,sdate) could very well speed up sorting.