I have the following query:
select *
from test_table
where app_id = 521
and is_deleted=0
and category in (7650)
AND created_timestamp >= '2020-07-28 18:19:26'
AND created_timestamp <= '2020-08-04 18:19:26'
ORDER BY created_timestamp desc
limit 30
All four fields, app_id, is_deleted, category and created_timestamp are indexed. However, the cardinality of app_id and is_deleted are very small (3 each).
category field is fairly distributed, but created_timestamp seems like a very good index choice for this query.
However, MySQL is not using the created_timestamp index and is in turn taking 4 seconds to return. If I force MySQL to use the created_timestamp index using USE INDEX (created_timestamp), it returns in 40ms.
I checked the output of explain command to see why that's happening, an found that MySQL is performing the query with the following params:
Automatic index decision, takes > 4s
type: index_merge
key: category,app_id,is_deleted
rows: 10250
filtered: 0.36
Using intersect(category,app_id,is_deleted); Using where; Using filesort
Force index usage:
Use index created_timestamp, takes < 50ms
type: range
key: created_timestamp
rows: 47000
filtered: 0.50
Using index condition; Using where; Backward index scan
MySQL probably decides that lesser number of rows to scan is better, and that makes sense also, but then why does it take forever for the query to return in that case? How can I fix this query?
The using intersection and the using filesort are both costly for performance. It's best if we can eliminate these.
Here's a test. I'm assuming the IN ( ... ) predicate could sometimes have multiple values, so it will be a range type query, and cannot be optimized as an equality.
CREATE TABLE `test_table` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`app_id` int(11) NOT NULL,
`is_deleted` tinyint(4) NOT NULL DEFAULT '0',
`category` int(11) NOT NULL,
`created_timestamp` timestamp NOT NULL,
`other` text,
PRIMARY KEY (`id`),
KEY `a_is_ct_c` (`app_id`,`is_deleted`,`created_timestamp`,`category`),
KEY `a_is_c_ct` (`app_id`,`is_deleted`,`category`,`created_timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
If we use your query and hint the optimizer to use the first index (created_timestamp before category), we get a query that eliminates both:
EXPLAIN SELECT * FROM test_table FORCE INDEX (a_is_ct_c)
WHERE app_id = 521
AND is_deleted=0
AND category in (7650,7651,7652)
AND created_timestamp >= '2020-07-28 18:19:26'
AND created_timestamp <= '2020-08-04 18:19:26'
ORDER BY created_timestamp DESC\G
id: 1
select_type: SIMPLE
table: test_table
partitions: NULL
type: range
possible_keys: a_is_ct_c
key: a_is_ct_c
key_len: 13
ref: NULL
rows: 1
filtered: 100.00
Extra: Using index condition
Whereas if we use the second index (category before created_timestamp), then at least the using intersection is gone, but we still have a filesort:
EXPLAIN SELECT * FROM test_table FORCE INDEX (a_is_c_ct)
WHERE app_id = 521
AND is_deleted=0
AND category in (7650,7651,7652)
AND created_timestamp >= '2020-07-28 18:19:26'
AND created_timestamp <= '2020-08-04 18:19:26'
ORDER BY created_timestamp DESC\G
id: 1
select_type: SIMPLE
table: test_table
partitions: NULL
type: range
possible_keys: a_is_c_ct
key: a_is_c_ct
key_len: 13
ref: NULL
rows: 3
filtered: 100.00
Extra: Using index condition; Using filesort
The "using index condition" is a feature of InnoDB to filter the fourth column at the storage engine level. This is called Index condition pushdown.
The optimal index for the query given, plus some others:
INDEX(app_id, is_deleted, -- put first, in either order
category, -- in this position, assuming it might have multiple INs
created_timestamp) -- a range; last.
"Index merge intersect" is probably always worse than having an equivalent composite index.
Note that an alternative for the Optimizer is to ignore the WHERE and focus on the ORDER BY, especially because of LIMIT 30. However, this is very risky. It may have to scan the entire table without finding the 30 rows desired. Apparently, it had to look at about 47000 rows to find the 30.
With the index above, it will touch only 30 (or fewer) rows.
"All four fields, ... are indexed." -- This is a common misconception, especially by newcomers to databases. It is very rare for a query to use more than one index. So, it is better to try for a "composite" index, which is likely to work much better.
How to build the optimal INDEX for a given SELECT: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
Related
I'm facing a performance issue on a mariadb database. It seems to me that mariadb is not using the correct index when doing a request with a subquery, while injecting manually the result of the subquery in the request successfully uses the index:
Here is the request with bad behavior (note that the second subquery reads more rows than necessary):
ANALYZE SELECT `orders`.* FROM `orders`
WHERE `orders`.`account_id` IN (SELECT `accounts`.`id` FROM `accounts` WHERE `accounts`.`user_id` = 88144)
AND ( orders.type not in ("LimitOrder", "MarketOrder")
OR orders.type in ("LimitOrder", "MarketOrder") AND orders.state <> "canceled"
OR orders.type in ("LimitOrder", "MarketOrder") AND orders.state = "canceled" AND orders.traded_btc > 0 )
AND (NOT (orders.type = 'AdminOrder' AND orders.state = 'canceled')) ORDER BY `orders`.`id` DESC LIMIT 20 OFFSET 0 \G;
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: accounts
type: ref
possible_keys: PRIMARY,index_accounts_on_user_id
key: index_accounts_on_user_id
key_len: 4
ref: const
rows: 7
r_rows: 7.00
filtered: 100.00
r_filtered: 100.00
Extra: Using index; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: orders
type: ref
possible_keys: index_orders_on_account_id_and_type,index_orders_on_type_and_state_and_buying,index_orders_on_account_id_and_type_and_state,index_orders_on_account_id_and_type_and_state_and_traded_btc
key: index_orders_on_account_id_and_type_and_state_and_traded_btc
key_len: 4
ref: bitcoin_central.accounts.id
rows: 60
r_rows: 393.86
filtered: 100.00
r_filtered: 100.00
Extra: Using index condition; Using where
When manually injecting the result of the subquery I have the correct behaviour (and expected performance):
ANALYZE SELECT `orders`.* FROM `orders`
WHERE `orders`.`account_id` IN (433212, 433213, 433214, 433215, 436058, 436874, 437950)
AND ( orders.type not in ("LimitOrder", "MarketOrder")
OR orders.type in ("LimitOrder", "MarketOrder") AND orders.state <> "canceled"
OR orders.type in ("LimitOrder", "MarketOrder") AND orders.state = "canceled" AND orders.traded_btc > 0 )
AND (NOT (orders.type = 'AdminOrder' AND orders.state = 'canceled'))
ORDER BY `orders`.`id` DESC LIMIT 20 OFFSET 0\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: orders
type: range
possible_keys: index_orders_on_account_id_and_type,index_orders_on_type_and_state_and_buying,index_orders_on_account_id_and_type_and_state,index_orders_on_account_id_and_type_and_state_and_traded_btc
key: index_orders_on_account_id_and_type_and_state_and_traded_btc
key_len: 933
ref: NULL
rows: 2809
r_rows: 20.00
filtered: 100.00
r_filtered: 100.00
Extra: Using index condition; Using where; Using filesort
1 row in set (0.37 sec)
Note that I have exactly the same issue when JOINing the two tables.
Here is an extract of the definitions of my orders table:
SHOW CREATE TABLE orders \G;
*************************** 1. row ***************************
Table: orders
Create Table: CREATE TABLE `orders` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`account_id` int(11) NOT NULL,
`traded_btc` decimal(16,8) DEFAULT '0.00000000',
`type` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`state` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
KEY `index_orders_on_account_id_and_type_and_state_and_traded_btc` (`account_id`,`type`,`state`,`traded_btc`),
CONSTRAINT `orders_account_id_fk` FOREIGN KEY (`account_id`) REFERENCES `accounts` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=8575594 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Does anyone know what's going on here?
Is there a way to force the database to use my index in my subrequest.
IN ( SELECT ... ) optimizes poorly. The usual solution is to turn into a JOIN:
FROM accounts AS a
JOIN orders AS o ON a.id = o.account_id
WHERE a.user_id = 88144
AND ... -- the rest of your WHERE
Or is that what you did with "Note that I have exactly the same issue when JOINing the two tables."? If so, let's see the query and it's EXPLAIN.
You refer to "expected performance"... Are you referring to the numbers in the EXPLAIN? Or do you have timings to back up the assertion?
I like to do this to get a finer grained look into how much "work" is going on:
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';
Those numbers usually make it clear whether a table scan was involved or whether the query stopped after OFFSET+LIMIT. The numbers are exact counts, unlike EXPLAIN, which is just estimates.
Presumably you usually look in orders via account_id? Here is a way to speed up such queries:
Replace the current two indexes
PRIMARY KEY (`id`),
KEY `account_id__type__state__traded_btc`
(`account_id`,`type`,`state`,`traded_btc`),
with these:
PRIMARY KEY (`account_id`, `type`, `id`),
KEY (id) -- to keep AUTO_INCREMENT happy.
This clusters all the rows for a given account, thereby making the queries run faster, especially if you are now I/O-bound. If some combination of columns makes a "natural" PK, then toss id completely.
(And notice how I shortened your key name without losing any info?)
Also, if you are I/O-bound, shrinking the table is quite possible by turning those lengthy VARCHARs (state & type) into ENUMs.
More
Given that the query involves
WHERE ... mess with INs and ORs ...
ORDER BY ...
LIMIT 20
and there are 2 million rows for that one user, there is no INDEX that can get past the WHERE to get into the ORDER BY so that it can consume the LIMIT. That is, it must perform this way:
filter through the 2M rows for that one user
sort (ORDER BY) some significant fraction of 2M rows
peel off 20 rows. (Yeah, 5.6 uses a "priority" queue, making the sort O(1) instead of O(log N), but this is not that much help.
I'm actually amazed that the IN( constants ) worked well.
I had the same problem. Please use inner join instead of in-subquery.
I have table
user[id, name, status] with index[status, name, id]
SELECT *
FROM user
WHERE status = 'active'
ORDER BY name, id
LIMIT 50
I have about 50000 users with status == 'active'
1.) Why does MySQL explain show about 50000 in ROWS column? Why it follows all leaf nodes even when the index columns equals to order by clause?
2.) When I change order by clause to
ORDER BY status, name, id
EXTRA column of explain clause shows:
Using index condition; Using where; Using filesort
Is there any reason why it can't use index order in this query?
edit1:
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`status` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `status_name_id` (`status`,`name`,`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
query:
SELECT *
FROM `user`
WHERE status = 'complete'
ORDER BY status, name, id
LIMIT 50
explain:
id: 1
select_type: SIMPLE
table: f_order
type: ref
possible_keys: status_name_id
key: status_name_id
key_len: 768
ref: const
rows: 50331
Extra: "Using where; Using index; Using filesort"
The weirdest thing is that if I change SELECT statement to
SELECT *, count(id)
It use index again and query is twice faster. And extra section contains only
Using where; Using index
Table contains 100k rows, 5 different statuses and 12 different names.
MySQL: 5.6.27
edit2:
Another example:
This takes 400ms (avg) and does explicit sort
SELECT *
FROM `user`
WHERE status IN('complete')
ORDER BY status, name, id
LIMIT 50
This takes 2ms (avg) and doesn't explicit sort
SELECT *
FROM `user`
WHERE status IN('complete', 'something else')
ORDER BY status, name, id
LIMIT 50
Q1: EXPLAIN is a bit lame. It fails to take into account the existence of the LIMIT when providing the Rows estimation. Be assured that if it can stop short, it will.
Q2: Did it say that it was using your index? Please provide the full EXPLAIN and SHOW CREATE TABLE.
More
With INDEX(status, name, id), the WHERE, ORDER BY, and LIMIT can be handled in the index. Hence it has to read only 50 rows.
Without that index, (or with practically any change to the query), much or all of the table would need to be read, stored in a tmp table, sorted, and only then could 50 rows be peeled off.
So, I suggest that it is more complicated than "explicit sort can kill my db server".
According to comments it is probably bug.
I have a table ‘t_table1′ include 3 fields :
`field1` tinyint(1) unsigned default NULL,
`field2` tinyint(1) unsigned default NULL,
`field3` tinyint(1) unsigned NOT NULL default ’0′,
and a Index:
KEY `t_table1_index1` (`field1`,`field2`,`field3`),
When I run this SQL1:
EXPLAIN SELECT * FROM table1 AS c WHERE c.field1 = 1 AND c.field2 = 0 AND c.field3 = 0
Then is show:
Select type: Simple
tyle: All
possible key: t_table1_index1
key: NULL
key_len: NULL
rows: 1042
extra: Using where
I think it say that my index useless in this case.
But when I run this SQL2:
EXPLAIN SELECT * FROM table1 AS c WHERE c.field1 = 1 AND c.field2 = 1 AND c.field3 = 1
it shows:
Select type: Simple
tyle: ref
possible key: t_table1_index1
key: t_table1_index1
key_len: 5
ref: const, const, const
rows: 1
extra: Using where
This case it used my index.
So please explain for me:
why SQL1 can not use index ?
with SQL1, how can i edit index or rewrite SQL to performing more quickly ?
Thanks !
The query optimizer uses many data points to decide how to execute a query. One of those is the selectivity of the index. To use an index requires potentially more disk accesses per row returned than a table scan because the engine has to read the index and then fetch the actual row (unless the entire query can be satisfied from the index alone). As the index becomes less selective (i.e. more rows match the criteria) the efficiency of using that index goes down. At some point it becomes cheaper to do a full table scan.
In your second example the optimizer was able to ascertain that the values you provided would result in only one row being fetched, so the index lookup was the correct approach.
In the first example, the values were not very selective, with an estimate of 1042 rows being returned out of 1776. Using the index would result in searching the index, building a list of selected row references and then fetching each row. With so many rows being selected, to use the index would have resulted in more work than just scanning the entire table lineraly and filtering the rows.
I am using MySQL version 5.5.14 to run the following query, QUERY 1, from a table of 5 Million rows:
SELECT P.ID, P.Type, P.Name, P.cty
, X(P.latlng) as 'lat', Y(P.latlng) as 'lng'
, P.cur, P.ak, P.tn, P.St, P.Tm, P.flA, P.ldA, P.flN
, P.lv, P.bd, P.bt, P.nb
, P.ak * E.usD as 'usP'
FROM PIG P
INNER JOIN EEL E
ON E.cur = P.cur
WHERE act='1'
AND flA >= '1615'
AND ldA >= '0'
AND yr >= (YEAR(NOW()) - 100)
AND lv >= '0'
AND bd >= '3'
AND bt >= '2'
AND nb <= '5'
AND cDate >= NOW()
AND MBRContains(LineString( Point(39.9097, -2.1973)
, Point(65.5130, 41.7480)
), latlng)
AND Type = 'g'
AND tn = 'l'
AND St + Tm - YEAR(NOW()) >= '30'
HAVING usP BETWEEN 300/2 AND 300
ORDER BY ak
LIMIT 100;
Using an Index (Type, tn, act, flA), I am able to obtain results within 800ms. In QUERY 2, I changed the ORDER BY clause to lv, I am also able to obtain results within similar timings. In QUERY 3, I changed the ORDER BY clause to ID and the query time slowed dramatically to a full 20s on an average of 10 trials.
Running the EXPLAIN SELECT statement produces exactly the same query execution plan:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: P
type: range
possible_keys: Index
key: Index
key_len: 6
ref: NULL
rows: 132478
Extra: Using where; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: E
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 3
ref: BS.P.cur
rows: 1
Extra:
My question is: why does ordering by ID in QUERY 3 runs so slow compared to the rest?
The partial table definition is as such:
CREATE TABLE `PIG` (
`ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
`lv` smallint(3) unsigned NOT NULL DEFAULT '0',
`ak` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`ID`),
KEY `id_ca` (`cty`,`ak`),
KEY `Index` (`Type`, `tn`, `act`, `flA`),
) ENGINE=MyISAM AUTO_INCREMENT=5000001 DEFAULT CHARSET=latin1
CREATE TABLE `EEL` (
`cur` char(3) NOT NULL,
`usD` decimal(11,10) NOT NULL,
PRIMARY KEY (`cur`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
UPDATE: After extensive testing of various ORDER BYs options, I have confirmed that the ID column which happens to be the Primary Key is the only one causing the slow query time.
From MySQL documentation at http://dev.mysql.com/doc/refman/5.6/en/order-by-optimization.html
In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause. These cases include the following:
. . .
The key used to fetch the rows is not the same as the one used in the ORDER BY:
`SELECT * FROM t1 WHERE key2=constant ORDER BY key1;`
This probably won't help, but what happens if you add AND ID > 0 to the WHERE clause? Would this cause MySQL to use the primary key for sorting? Worth a try I suppose.
(It seems odd that ordering with ak is efficient, since ak does not even have an index, but that may be due to fewer values for ak?)
If the condition in the WHERE clause differs from the one in the ORDER BY or it is not part of a composite index, then the sorting does not take place in the storage engine but rather at the MySQL server level which is much slower. Long story short you must rearrange your indexes in order to satisfy both the row filtering and the sorting as well.
you can use force index(PRIMARY)
try it, and you will see in explain query that mysql now will use the primary key index when 'order by'
From time to time I encounter a strange MySQL behavior. Let's assume I have indexes (type, rel, created), (type), (rel). The best choice for a query like this one:
SELECT id FROM tbl
WHERE rel = 3 AND type = 3
ORDER BY created;
would be to use index (type, rel, created).
But MySQL decides to intersect indexes (type) and (rel), and that leads to worse perfomance. Here is an example:
mysql> EXPLAIN
-> SELECT id FROM tbl
-> WHERE rel = 3 AND type = 3
-> ORDER BY created\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tbl
type: index_merge
possible_keys: idx_type,idx_rel,idx_rel_type_created
key: idx_type,idx_rel
key_len: 1,2
ref: NULL
rows: 4343
Extra: Using intersect(idx_type,idx_rel); Using where; Using filesort
And the same query, but with a hint added:
mysql> EXPLAIN
-> SELECT id FROM tbl USE INDEX (idx_type_rel_created)
-> WHERE rel = 3 AND type = 3
-> ORDER BY created\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tbl
type: ref
possible_keys: idx_type_rel_created
key: idx_type_rel_created
key_len: 3
ref: const,const
rows: 8906
Extra: Using where
I think MySQL takes an execution plan which contains less number in the "rows" column of the EXPLAIN command. From that point of view, index intersection with 4343 rows looks really better than using my combined index with 8906 rows. So, maybe the problem is within those numbers?
mysql> SELECT COUNT(*) FROM tbl WHERE type=3 AND rel=3;
+----------+
| COUNT(*) |
+----------+
| 3056 |
+----------+
From this I can conclude that MySQL is mistaken at calculating approximate number of rows for combined index.
So, what can I do here to make MySQL take the right execution plan?
I can not use optimizer hints, because I have to stick to Django ORM
The only solution I found yet is to remove those one-field indexes.
MySQL version is 5.1.49.
The table structure is:
CREATE TABLE tbl (
`id` int(11) NOT NULL AUTO_INCREMENT,
`type` tinyint(1) NOT NULL,
`rel` smallint(2) NOT NULL,
`created` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_type` (`type`),
KEY `idx_rel` (`rel`),
KEY `idx_type_rel_created` (`type`,`rel`,`created`)
) ENGINE=MyISAM;
It's hard to tell exactly why MySQL chooses index_merge_intersection over the index scan, but you should note that with the composite indexes, statistics up to the given column are stored for the composite indexes.
The value of information_schema.statistics.cardinality for the column type of the composite index will show the cardinality of (rel, type), not type itself.
If there is a correlation between rel and type, then cardinality of (rel, type) will be less than product of cardinalities of rel and type taken separately from the indexes on corresponding columns.
That's why the number of rows is calculated incorrectly (an intersection cannot be larger in size than a union).
You can forbid index_merge_intersection by setting it to off in ##optimizer_switch:
SET optimizer_switch = 'index_merge_intersection=off'
Another thing is worth mentioning: you would not have the problem if you deleted the index on type only. the index is not required since it duplicates a part of the composite index.
Some time the intersection on same table could be interesting, and you may not want to remove an index on a single colum so as some other query work well with intersection.
In such case, if the bad execution plan concerns only one single query, a solution is to exclude the unwanted index. Il will then prevent the usage of intersection only for that sepcific query...
In your example :
SELECT id FROM tbl IGNORE INDEX(idx_type)
WHERE rel = 3 AND type = 3
ORDER BY created;
enter code here