I'm working on the optimization of MySQL query these days, one of the issues I've encountered is DATE() maybe not working for the table partitioned by date range.
Here is the sample table:
CREATE TABLE `testing_db` (
`date_time` date NOT NULL,
`id` varchar(10) NOT NULL,
PRIMARY KEY (`date_time`,`id`) USING BTREE,
UNIQUE KEY `unique` (`date_time`,`id`),
KEY `idx_date_time` (`date_time`),
KEY `idx_id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
/*!50100 PARTITION BY RANGE (to_days(`date_time`))
(PARTITION p0 VALUES LESS THAN (TO_DAYS('2021-01-01')),
PARTITION p2021_01 VALUES LESS THAN (TO_DAYS('2021-01-31')),
PARTITION p2021_02 VALUES LESS THAN (TO_DAYS('2021-02-28')),
PARTITION future VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */;
Statement with DATE():
EXPLAIN
SELECT date_time, id FROM testing_db WHERE date_time = '2021-02-25';
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE testing_db p2021_02 ref PRIMARY,unique,idx_date_time,idx_id PRIMARY 3 const 1 100.00 Using index
Statement without DATE():
EXPLAIN
SELECT date_time, id FROM testing_db WHERE DATE(date_time) = '2021-02-25';
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE testing_db p0,p2021_01,p2021_02,future index idx_date_time 3 1 100.00 Using where; Using index
Comparing two explanations, obviously the statement with DATE() scans all partitions while the statement without DATE() doesn't. Its impact may be significant in a large table.
I've researched similar issues, but it seems they are not relevant to this case:
Official Doc, DATE() extracts the date part of the date or datetime.
Mysql, partitioning not working on date range
https://bugs.mysql.com/bug.php?id=28928
Could you help figure it out? Thanks a lot!
The use of the DATE() function in your WHERE clause negates the use of any relevant index which causes a full table scan. The full table scan will need to read from all partitions.
In your example you are applying the DATE() function to a column of type DATE, so it serves no purpose.
INDEX(date_time) is unnecessary because there are two other indexes starting with that column.
A PRIMARY KEY is (in MySQL) a UNIQUE key. So your UNIQUE(datetime, id) is redundant.
Usually is is unwise to start any index with the partition key (date_time).
WHERE DATE(date_time) = ... is not "sargable". That is, no indexing of date_time can be used when hiding a column in a function (DATE()). (This is the main problem that you are asking about.)
Instead of using DATE(), use a range, such as:
WHERE date_time >= '2021-02-26'
AND date_time < '2021-02-26' + INTERVAL 1 DAY
Based on the above comments, plus other things, just these two indexes would be better:
PRIMARY KEY(id, date_time),
INDEX(date_time, id)
Please don't call it date_time when it is only a DATE. My comments work for either datatype. The DATE() function is never needed around a column of datatype DATE nor a string that looks like a date.
Your partition definitions put the last day of each month in the 'wrong' partition'.
Be aware that PARTITIONing rarely helps with performance. I discuss that further in Pagination
Related
I have a simple table:
CREATE TABLE `user_values` (
`id` bigint NOT NULL AUTO_INCREMENT,
`user_id` bigint NOT NULL,
`value` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`,`id`),
KEY `id` (`id`,`user_id`);
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
that I am trying to execute the following simple query:
select * from user_values where user_id in (20020, 20030) order by id desc;
I would fully expect this query to 100% use an index (either the (user_id, id) one or the (id, user_id) one) Yet, it turns out that's not the case:
explain select * from user_values where user_id in (20020, 20030); yields:
id
select_type
table
partitions
type
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
user_values
NULL
range
user_id
8
NULL
9
100.00
Using index condition; Using filesort
Why is that the case? How can I avoid a filesort on this trivial query?
You can't avoid the filesort in the query you show.
When you use a range predicate (for example, IN ( ) is a range predicate), and an index is used, the rows are read in index order. But there's no way for the MySQL query optimizer to guess that reading the rows in index order by user_id will guarantee they are also in id order. The two user_id values you are searching for are potentially scattered all over the table, in any order. Therefore MySQL must assume that once the matching rows are read, an extra step of sorting the result by id is necessary.
Here's an example of hypothetical data in which reading the rows by an index on user_id will not be in id order.
id
user_id
1
20030
2
20020
3
20016
4
20030
5
20020
So when reading from an index on (user_id, id), the matching rows will be returned in the following order, sorted by user_id first, then by id:
id
user_id
2
20020
5
20020
1
20030
4
20030
Clearly, the result is not in id order, so it needs to be sorted to satisfy the ORDER BY you requested.
The same kind of effect happens for other type of predicates, for example BETWEEN, or < or != or IS NOT NULL, etc. Every predicate except for = is a range predicate.
The only ways to avoid the filesort are to change the query in one of the following ways:
Omit the ORDER BY clause and accepting the results in whatever order the optimizer chooses to return them, which could be in id order, but only by coincidence.
Change the user_id IN (20020, 20030) to user_id = 20020, so there is only one matching user_id, and therefore reading the matching rows from the index will already be returned in the id order, and therefore the ORDER BY is a no-op. The optimizer recognizes when this is possible, and skips the filesort.
MySQL will most likely use index for the query (unless the user_id's in the query covers most of the rows).
The "filesort" happens in memory (it's really not a filesort), and is used to sort the found rows based on the ORDER BY clause.
You cannot avoid a "sort" in this case.
There were about 9 rows to sort, so it could not have taken long.
How long did the query take? Probably only a few milliseconds, so who cares?
"Filesort" does not necessarily mean that a "file" was involved. In many queries the sort is done in RAM.
Do you use id for anything other than to have a PRIMARY KEY on the table? If not, then this will help a small amount. (The speed-up won't be indicated in EXPLAIN.)
PRIMARY KEY (`user_id`,`id`), -- to avoid secondary lookups
KEY `id` (`id`); -- to keep auto_increment happy
As mention in the title, I would like to know any solution for this by not using store prodecure, temporarily table etc.
Compare Query#1 and Query#3, Query#3 get worst performance result. Does it have any workaround to put in variable but without impact the performance result.
Schema (MySQL v5.7)
create table `order`(
id BIGINT(20) not null auto_increment,
transaction_no varchar(20) not null,
primary key (`id`),
unique key uk_transaction_no (`transaction_no`)
);
insert into `order` (`id`,`transaction_no`)
value (1,'10001'),(2,'10002'),(3,'10003'),(4,'10004');
Query #1
explain select * from `order` where transaction_no in ('10001','10004');
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
order
range
uk_transaction_no
uk_transaction_no
22
2
100
Using where; Using index
Query #2
set #transactionNos = "10001,10004";
There are no results to be displayed.
Query #3
explain select * from `order` where find_in_set(transaction_no, #transactionNos);
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
order
index
uk_transaction_no
22
4
100
Using where; Using index
Short Answer: See Sargeability
Long Answer:
MySQL makes no attempt to optimize an expression when an indexed column when it is hidden inside a function call such as FIND_IN_SET(), DATE(), etc. Your Query 3 will always be performed as a full table scan or a full index scan.
So the only way to optimize what you are doing is to construct
IN ('10001','10004')
That is often difficult to achieve when "binding" a list of values. IN(?) will not work in most APIs to MySQL.
here is the "explain" of my query:
explain
select eil.sell_fmt, count(sell_fmt) as itemCount
from table_items eil
where eil.cl_Id=123 and eil.si_Id='0'
and start_date <= now() and end_date is not null and end_date < NOW()
group by eil.sell_fmt
without date (start_date, end_date) filters:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE eil ref table_items_clid_siid_sellFmt 39 const,const 7393 Using where; Using index
With date filters:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE eil ref table_items_clid_siid_sellFmt 39 const,const 8400 Using where
possible_keys are:
table_items_clid_siid, table_items_clid_siid_itemId, table_items_clid_siid_startDate_endDate, table_items_clid_siid_sellFmt
The query without date filters is very fast (0.4 sec), but with date filters, its taking about 30 seconds. total records are 14K only.
Table field types:
`cl_Id` int(11) NOT NULL,
`si_Id` varchar(11) NOT NULL,
`start_date` datetime DEFAULT NULL,
`end_date` datetime DEFAULT NULL,
`sell_fmt` varchar(20) DEFAULT NULL
I concatenated field-names to give index names, so you can estimate combined fields available in the index.
Can somebody guide me here? what's going on here? what is the best course of action i should take here, or where i'm doing wrong?
I need one more suggestion: in another query on same table, a user can filter based on UPTO 10 fields, and in no definite order of fields (random no of fields in random order). Then this type search would be too slow again. What's the best strategy then? one covering index with "all" possible searchable fields? if yes, does the order of fields in index matter? (i.e. if that order is different than that of fields in query, will the index be used?
First, without seeing your create table statement, I can offer the following... create composite index (multiple fields) that best match your common querying elements applicable to the where clause, starting with the smaller nominal count basis. Since you are explicitly looking for a "cl_ID" and "si_ID" plus start and end dates. Since you have a group by, I would add that to the index for optimization purposes and be a completely COVERING index so the engine does not need to go back to the raw data to complete the query. It can resolve by all fields in the index directly.
I would have an index on
( cl_id, si_id, start_date, end_date, sell_fmt )
Finally, change your count from count(sell_fmt) to just count(*) indicating an "i don't care about a specific field, just as long as a record is found, count it"
I've got a table with a column called "date".
The table looks somthing like this
CREATE TABLE IF NOT EXISTS `offers_log_archive` (
...
`date` date DEFAULT NULL,
...
KEY `date` (`date`)
) ENGINE=InnoDB
I perform the following query on this table:
SELECT
*
FROM
offers_log_archive as ola
WHERE
ola.date >= "2012-12-01" and
ola.date <= "2012-12-31"
Then I did the following:
explain (SELECT
*
FROM
offers_log_archive as ola
WHERE
ola.date >= "2012-12-01" and
ola.date <= "2012-12-31" );
The result of this explain is
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ola ALL date NULL NULL NULL 6206460 Using where
Why do I get type ALL? From what I know that basically means that the query will inspect every row in the table and ignores the index on date. Although I would expect that mysql would use this.
What happens here and why is the date index ignored?
Almost all values in your column are within the range of the query, so not only would the index be next to useless (it would add little value), but it would actually be much more expensive to use the index than do a simple table scan.
Edit
Try first running ANALYZE on the table:
ANALYZE TABLE MYTABLE
If that doesn't help, try changing the syntax to use BETWEEN:
WHERE ola.date BETWEEN '2012-12-01' AND '2012-12-31'
I have query:
EXPLAIN SELECT * FROM _mod_news USE INDEX ( ind1 ) WHERE show_lv =1 AND active =1 AND START <= NOW( )
AND ( END >= NOW( ) OR END = "0000-00-00 00:00:00" ) AND id <> "18041" AND category_id = "3" AND leta =1 ORDER BY sort_id ASC , DATE DESC LIMIT 7
result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE _mod_news ref ind1 ind1 2 const,const 11386 Using where; Using filesort
mysql is performing full table scan
ind1 =
ALTER TABLE `_mod_news` ADD INDEX ind1 ( `show_lv`, `active`, `start`, `end`, `id`, `category_id`, `leta`, `sort_id`, `date`);
I tested on following index, but nothing changes
ALTER TABLE `_mod_news` ADD INDEX ind1 ( `show_lv`, `active`, `start`, `end`, `id`, `category_id`, `leta`);
Question is: where i can learn how to create indexes on many where conditions? Or someone can explain how to tell to mysql to use and index and not to scan whole table.
Thanks.
I would suggest not forcing index. Mysql is a great at selecting the best possible index unless you have better understanding of the data you are querying.
You cannot use ORDER BY optimization because you are mixing the ASC and DESC in that part.
Therefore your only option is to create index such that:
constant values before range
integers before dates, dates before strings, smaller size vales before bigger size values
Creating a large index also adds an overhead to storage and insert-update time, so i would not add to index fields that are not eliminating a lot of rows (i.e 90% or rows have a value of 1 or i.e id<>"18041" but that most likely eliminates < 1% of rows).
If you want to learn more about optimizing: http://dev.mysql.com/doc/refman/5.0/en/select-optimization.html
Create multiple different indexes (on decent size of data you expect seeing in the table), see which one mysql chooses, benchmark them by forcing each one of them, then use your common sense to cut down on index space usage.
You can see from you EXPLAIN output that it is actually NOT performing a full table scan because in that case it would not display it using the index even when you are forcing it.
You can try with USE INDEX or FORCE INDEX