there is the table test :
show create table test;
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`body` longtext NOT NULL,
`timestamp` int(11) NOT NULL,
`handle_after` datetime NOT NULL,
`status` varchar(100) NOT NULL,
`queue_id` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
KEY `idxTimestampStatus` (`timestamp`,`status`),
KEY `idxTimestampStatus2` (`status`,`timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=80000 DEFAULT CHARSET=utf8
there is two select's
1) select * from test where status = 'in_queue' and timestamp > 1625721850;
2) select id from test where status = 'in_queue' and timestamp > 1625721850;
in the first select explain show me that no indexes are used
in the second select index idxTimestampStatus is used.
MariaDB [db]> explain select * from test where status = 'in_queue' and timestamp > 1625721850;
+------+-------------+-------+------+----------------------------------------+------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+----------------------------------------+------+---------+------+----------+-------------+
| 1 | SIMPLE | test | ALL | idxTimestampStatus,idxTimestampStatus2 | NULL | NULL | NULL | 80000 | Using where |
+------+-------------+-------+------+----------------------------------------+------+---------+------+----------+-------------+
MariaDB [db]> explain select id from test where status = 'in_queue' and timestamp > 1625721850;
+------+-------------+-------+------+----------------------------------------+---------------------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+----------------------------------------+---------------------+---------+-------+------+--------------------------+
| 1 | SIMPLE | test | ref | idxTimestampStatus,idxTimestampStatus2 | idxTimestampStatus2 | 302 | const | 4 | Using where; Using index |
+------+-------------+-------+------+----------------------------------------+---------------------+---------+-------+------+--------------------------+
Help me figure out what i'm doing wrong ?
How should i create index for first select?
why does the number of columns affect the index usage ?
What you saw is to be expected. (The "number of columns" did not cause what you saw.) Read all the points below; various combinations of them should address all the issues raised in both the Question and Comments.
Deciding between index and table scan:
The Optimizer uses statistics to decide between using an index and doing a full table scan.
If less than (about) 20% of the rows need to be fetched, the index will be used. This involves bouncing back and forth between the index's BTree and the data's BTree.
If more of the table is needed, then it is deemed more efficient to simply scan the table, ignoring any rows that don't match the WHERE.
The "20%" is not a hard-and-fast number.
SELECT id ... status ... timestamp;
In InnoDB, a secondary index implicitly includes the columns of the PRIMARY KEY.
If all the columns mentioned in the query are in an index, then that index is "covering". This means that all the work can be done in the index's BTree without touching the data's BTree.
Using index == "covering". (That is, EXPLAIN gives this clue.)
"Covering" overrides the "20%" discussion.
SELECT * ... status ... timestamp;
SELECT * needs to fetch all columns, so "covering" does not apply and the "20%" becomes relevant.
If 1625721850 were a larger number, the EXPLAIN would switch from ALL to Index.
idxTimestampStatus2 (status,timestamp)
The order of the clauses in WHERE does not matter.
The order of the columns in a "composite" index is important. ("Composite" == multi-column)
Put the = column(s) first, then one "range" (eg >) column.
More discussion: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
Related
I have a MySQL (v8.0.30) table that stores trades, the schema is the following:
CREATE TABLE `log_fill` (
`id` bigint NOT NULL AUTO_INCREMENT,
`orderId` varchar(63) NOT NULL,
`clientOrderId` varchar(36) NOT NULL,
`symbol` varchar(31) NOT NULL,
`executionId` varchar(255) DEFAULT NULL,
`executionSide` tinyint NOT NULL COMMENT '0 = long, 1 = short',
`executionSize` decimal(15,2) NOT NULL,
`executionPrice` decimal(21,8) unsigned NOT NULL,
`executionTime` bigint unsigned NOT NULL,
`executionValue` decimal(21,8) NOT NULL,
`executionFee` decimal(13,8) NOT NULL,
`feeAsset` varchar(63) DEFAULT NULL,
`positionSizeBeforeFill` decimal(21,8) DEFAULT NULL,
`apiKey` int NOT NULL,
`side` varchar(20) DEFAULT NULL,
`reconciled` tinyint unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `executionId` (`executionId`,`executionSide`),
KEY `apiKey` (`apiKey`)
) ENGINE=InnoDB AUTO_INCREMENT=6522695 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
As you can see, there's a BTREE index on the column apiKey which stands for a user, this way I can quickly retrieve all trades for a specific user.
My goal is a query that returns positionSizeBeforeFill + executionSize for the last record, given an apiKey and a symbol. So I wrote the following:
SELECT positionSizeBeforeFill + executionSize
FROM log_fill
WHERE apiKey = 90 AND symbol = 'ABCD'
ORDER BY id DESC
However the execution is extremely slow (around 100ms). I've noticed that running either WHERE or ORDER BY (and not both together) drastically reduces execution time. For example
SELECT positionSizeBeforeFill + executionSize
FROM log_fill
WHERE apiKey = 90 AND symbol = 'ABCD'
only takes 220 microseconds to execute. The number of records after filtering by apiKey and symbol is 388.
Similarly,
SELECT positionSizeBeforeFill + executionSize
FROM log_fill
ORDER BY id DESC
takes 26 microseconds (on a 3 million records table).
All in all, separately running WHERE and ORDER BY takes microseconds of execution, when I combine them we scale up to milliseconds (around 1000x more).
Running EXPLAIN on the slow query it turns out it has to examine 116032 rows.
I tried to create a temporary table hoping for MySQL to perform sorting only on the filtered records, but the outcome is the same. Was wondering whether the problem might be the index (whose cardinality is 203), but how can it be the case when WHERE alone takes very little time? I could not find similar cases on other questions or forums. I think I just fail at understanding how InnoDB selects data, I thought it would first filter by WHERE and then perform ORDER BY on the filtered rows. How can I improve this? Thanks!
Edit: The EXPLAIN statement on the slow query returns
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+------+---------------+--------+---------+-------+--------+----------+----------------------------------+
| 1 | SIMPLE | log_fill_tmp | NULL | ref | apiKey | apiKey | 4 | const | 116032 | 10.00 | Using where; Backward index scan |
The query with WHERE only
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+------+---------------+--------+---------+-------+--------+----------+-------------+
| 1 | SIMPLE | log_fill_tmp | NULL | ref | apiKey | apiKey | 4 | const | 116032 | 10.00 | Using where |
The query with ORDER BY only on the full table
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+-------+---------------+---------+---------+------+---------+----------+---------------------+
| 1 | SIMPLE | log_fill_tmp | NULL | index | NULL | PRIMARY | 8 | NULL | 2503238 | 100.00 | Backward index scan |
26 microseconds for any query against a 3-million-row table implies that you have the Query cache enabled. Please rerunning your timings with SELECT SQL_NO_CACHE .... (Even milliseconds would be suspicious.) Were all 3M rows returned? Shoveling that many rows probably takes more than a second under any circumstances.
Meanwhile, to speed up the first two queries, add
INDEX(symbol, apikey, id)
EXPLAIN gives only approximate (eg, 116032) counts. A "cardinality" of 203 is also just an estimate, but it is used by the Optimizer in some situations. Please get the exact count just to check that there really are any rows:
SELECT COUNT(*)
FROM log_fill
WHERE apiKey = 90 AND symbol = 'ABCD'
With the ORDER BY id DESC, it will scan the entire B+Tree what holds the data. As it says, it will do a 'Backward index scan'. However, since the "index" is the PRIMARY KEY and the PK is clustered with the data, it is really referring to the data's BTree.
The EXPLAIN for the first query decided that the indexes were not useful enough for WHERE; instead it avoided the sort (ORDER BY) by doing the Backward full table scan, same as the 3rd query. (And ignored any rows that did not match the WHERE.
I added a composite index on (apiKey, symbol) and now the query runs in as little as 0.2ms. In order to reduce the creation time for the index I reduce the number of records from 3M to 500K and the gain in time is about 97%, I believe it's going to be more on the full table. I thought using just the apiKey index it would first filter out by user, then by symbol, but I'm probably wrong.
I create a table like this:
CREATE TABLE `testtable` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`created_at` datetime NOT NULL,
`updated_at` timestamp NOT NULL,
PRIMARY KEY (`id`),
KEY `test_updated_at_index` (`updated_at`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
And the explain on a select says that it is using filesort:
mysql> explain select * from testtable order by updated_at;
+----+-------------+-----------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+------+---------+------+------+----------------+
| 1 | SIMPLE | testtable | ALL | NULL | NULL | NULL | NULL | 1 | Using filesort |
+----+-------------+-----------+------+---------------+------+---------+------+------+----------------+
Why does it not use the index? Additionally, if I just remove the created_at column, then it DOES use the index!
Why does adding a single additional column cause mysql to forget how to use the index?
How can I change the table so that mysql DOES use the index to order by, even with an additional column?
There is no WHERE clause, so you read all records from the table. Most often it is way faster to read the table directly, even if you must sort the data afterwards, rather than looping over an index and picking each record separately.
Only if you were interested only in a small part of the table, say 2% of its data, would it make sense to use the index. Or if you were only interested in the indexed column alone, of course.
This is weird to me:
One table 'ACTIVITIES' with one index on ACTIVITY_DATE. The exact same query with different LIMIT value results in different execution plan.
Here it is:
mysql> explain select * from ACTIVITIES order by ACTIVITY_DATE desc limit 20
-> ;
+----+-------------+------------+-------+---------------+-------------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+-------------+---------+------+------+-------+
| 1 | SIMPLE | ACTIVITIES | index | NULL | ACTI_DATE_I | 4 | NULL | 20 | |
+----+-------------+------------+-------+---------------+-------------+---------+------+------+-------+
1 row in set (0.00 sec)
mysql> explain select * from ACTIVITIES order by ACTIVITY_DATE desc limit 150
-> ;
+----+-------------+------------+------+---------------+------+---------+------+-------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+-------+----------------+
| 1 | SIMPLE | ACTIVITIES | ALL | NULL | NULL | NULL | NULL | 10629 | Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+-------+----------------+
1 row in set (0.00 sec)
How come when I limit 150 it is not using the index? I mean, scanning 150 lines seems faster than scanning 10629 rows, right?
EDIT
The query uses the index till "limit 96" and starts filesort at "limit 97".
The table has nothing specific, even not a foreign key, here is the complete create table:
mysql> show create table ACTIVITIES\G
*************************** 1. row ***************************
Table: ACTIVITIES
Create Table: CREATE TABLE `ACTIVITIES` (
`ACTIVITY_ID` int(11) NOT NULL AUTO_INCREMENT,
`ACTIVITY_DATE` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`USER_KEY` varchar(50) NOT NULL,
`ITEM_KEY` varchar(50) NOT NULL,
`ACTIVITY_TYPE` varchar(1) NOT NULL,
`EXTRA` varchar(500) DEFAULT NULL,
`IS_VISIBLE` varchar(1) NOT NULL DEFAULT 'Y',
PRIMARY KEY (`ACTIVITY_ID`),
KEY `ACTI_USER_I` (`USER_KEY`,`ACTIVITY_DATE`),
KEY `ACTIVITY_ITEM_I` (`ITEM_KEY`,`ACTIVITY_DATE`),
KEY `ACTI_ITEM_TYPE_I` (`ITEM_KEY`,`ACTIVITY_TYPE`,`ACTIVITY_DATE`),
KEY `ACTI_DATE_I` (`ACTIVITY_DATE`)
) ENGINE=InnoDB AUTO_INCREMENT=10091 DEFAULT CHARSET=utf8 COMMENT='Logs activity'
1 row in set (0.00 sec)
mysql>
I also tried to run "ANALYSE TABLE ACTIVITIES" but that did not change a thing.
That's the way things go. Bear with me a minute...
The Optimizer would like to use an INDEX, in this case ACTI_DATE_I. But it does not want to use it if that would be slower.
Plan A: Use the index.
Reach into the BTree-structured index at the end (because of DESC)
Scan backward
For each row in the index, look up the corresponding row in the data. Note: The index has (ACTIVITY_DATE, ACTIVITY_ID) because the PRIMARY KEY is implicitly appended to any secondary key. To reach into the "data" using the PK (ACTIVITY_ID) is another BTree lookup, potentially random. Hence, it is potentially slow. (But not very slow in your case.)
This stops after LIMIT rows.
Plan B: Ignore the table
Scan the table, building a tmp table. (Likely to be in-memory.)
Sort the tmp table
Peel off LIMIT rows.
In your case (96 -- 1% of 10K) it is surprising that it picked the table scan. Normally, the cutoff is somewhere around 10%-30% of the number of rows in the table.
ANALYZE TABLE should have caused a recalculation of the statistics, which could have convinced it to go with the other Plan.
What version of MySQL are you using? (No, I don't know of any changes in this area.)
One thing you could try: OPTIMIZE TABLE ACTIVITIES; That will rebuild the table, thereby repacking the blocks and leading to potentially different statistics. If that helps, I would like to know it -- since I normally say "Optimize table is useless".
I have the following MySQL table (simplified):
CREATE TABLE `track` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(256) NOT NULL,
`is_active` tinyint(1) NOT NULL,
PRIMARY KEY (`id`),
KEY `is_active` (`is_active`, `id`)
) ENGINE=MyISAM AUTO_INCREMENT=7495088 DEFAULT CHARSET=utf8
The 'is_active' column marks rows that I want to ignore in most, but not all, of my queries. I have some queries that read chunks out of this table periodically. One of them looks like this:
SELECT id,title from track where (track.is_active=1 and track.id > 5580702) ORDER BY id ASC LIMIT 10;
This query takes over a minute to execute. Here's the execution plan:
> EXPLAIN SELECT id,title from track where (track.is_active=1 and track.id > 5580702) ORDER BY id ASC LIMIT 10;
+----+-------------+-------+------+----------------+--------+---------+-------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------+--------+---------+-------+---------+-------------+
| 1 | SIMPLE | t | ref | PRIMARY,is_active | is_active | 1 | const | 3747543 | Using where |
+----+-------------+-------+------+----------------+--------+---------+-------+---------+-------------+
Now, if I tell MySQL to ignore the 'is_active' index, the query happens instantaneously.
> EXPLAIN SELECT id,title from track IGNORE INDEX(is_active) WHERE (track.is_active=1 AND track.id > 5580702) ORDER BY id ASC LIMIT 10;
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+
| 1 | SIMPLE | t | range | PRIMARY | PRIMARY | 4 | NULL | 1597518 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+
Now, what's really strange is that if I FORCE MySQL to use the 'is_active' index, the query once again happens instantaneously!
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+
| 1 | SIMPLE | t | range | is_active |is_active| 5 | NULL | 1866730 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+
I just don't understand this behavior. In the 'is_active' index, rows should be sorted by is_active, followed by id. I use both the 'is_active' and 'id' columns in my query, so it seems like it should only need to do a few hops around the tree to find the IDs, then use those IDs to retrieve the titles from the table.
What's going on?
EDIT: More info on what I'm doing:
Query cache is disabled
Running OPTIMIZE TABLE and ANALYZE TABLE had no effect
6,620,372 rows have 'is_active' set to True. 874,714 rows have 'is_active' set to False.
Using FORCE INDEX(is_active) once again speeds up the query.
MySQL version 5.1.54
It looks like MySQL is making a poor decision about how to use the index.
From that query plan, it is showing it could have used either the PRIMARY or is_active index, and it has chosen is_active in order to narrow by track.is_active first. However, it is only using the first column of the index (track.is_active). That gets it 3747543 results which then have to be filtered and sorted.
If it had chosen the PRIMARY index, it would be able to narrow down to 1597518 rows using the index, and they would be retrieved in order of track.id already, which should require no further sorting. That would be faster.
New information:
In the third case where you are using FORCE INDEX, MySQL is using the is_active index but now instead of only using the first column, it is using both columns (see key_len). It is therefore now able to narrow by is_active and sort and filter by id using the same index, and since is_active is a single constant, the ORDER BY is satisfied by the second column (ie the rows from a single branch of the index are already in sorted order). This seems to be an even better outcome than using PRIMARY - and probably what you intended in the first place, right?
I don't know why it wasn't using both columns of this index without FORCE INDEX, unless the query has changed in a subtle way in between. If not I'd put it down to MySQL making bad decisions.
I think the speedup is due to your where clause. I am assuming that it is only retrieving a small subset of the rows in the entire large table. It is faster to do a table scan of the retrieved data for is_active on the small subset than to do the filtering through a large index file. Traversing a single column index is much faster than traversing a combined index.
Few things you could try:
Do an OPTIMIZE and CHECK on your table, so mysql will re-calculate index values
have a look at http://dev.mysql.com/doc/refman/5.1/en/index-hints.html - you can tell mysql to choose the right index in different cases
I am trying to optimize a bigger query and ran into this wall when I realized this part of the query was doing a full table scan, which in my mind does not make sense considering the field in question is a primary key. I would assume that the MySQL Optimizer would use the index.
Here is the table:
CREATE TABLE userapplication (
application_id int(11) NOT NULL auto_increment,
userid int(11) NOT NULL default '0',
accountid int(11) NOT NULL default '0',
resume_id int(11) NOT NULL default '0',
coverletter_id int(11) NOT NULL default '0',
user_email varchar(100) NOT NULL default '',
account_name varchar(200) NOT NULL default '',
resume_name varchar(255) NOT NULL default '',
resume_modified datetime NOT NULL default '0000-00-00 00:00:00',
cover_name varchar(255) NOT NULL default '',
cover_modified datetime NOT NULL default '0000-00-00 00:00:00',
application_status tinyint(4) NOT NULL default '0',
application_created datetime NOT NULL default '0000-00-00 00:00:00',
application_modified timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
publishid int(11) NOT NULL default '0',
application_visible int(11) default '1',
PRIMARY KEY (application_id),
KEY publishid (publishid),
KEY application_status (application_status),
KEY userid (userid),
KEY accountid (accountid),
KEY application_created (application_created),
KEY resume_id (resume_id),
KEY coverletter_id (coverletter_id),
) ENGINE=MyISAM ;
This simple query seems to do a full table scan:
SELECT * FROM userapplication WHERE application_id > 1025;
This is the output of the EXPLAIN:
+----+-------------+-------------------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | userapplication | ALL | PRIMARY | NULL | NULL | NULL | 784422 | Using where |
+----+-------------+-------------------+------+---------------+------+---------+------+--------+-------------+`
Any ideas how to prevent this simple query from doing a full table scan? Or am I out of luck?
You'd probably be better off letting MySql decide on the query plan. There is a good
chance that doing an index scan would be less efficient than a full table scan.
There are two data structures on disk for this table
The table itself; and
The primary key B-Tree index.
When you run a query the optimizer has two options about how to access the data:
SELECT * FROM userapplication WHERE application_id > 1025;
Using The Index
Scan the B-Tree index to find the address of all the rows where application_id > 1025
Read the appropriate pages of the table to get the data for these rows.
Not using the Index
Scan the entire table, and pick the appropriate records.
Choosing the best stratergy
The job of the query optimizer is to choose the most efficient strategy for getting the data you want. If there are a lot of rows with an application_id > 1025 then it can actually be less efficient to use the index. For example if 90% of the records have an application_id > 1025 then the query optimizer would have to scan around 90% of the leaf nodes of the b-tree index and then read at least 90% of the table as well to get the actual data; this would involve reading more data from disk than just scanning the table.
MyISAM tables are not clustered, a PRIMARY KEY index is a secondary index and requires an additional table lookup to get the other values.
It is several times more expensive to traverse the index and do the lookups. If you condition is not very selective (yields a large share of total records), MySQL will consider table scan cheaper.
To prevent it from doing a table scan, you could add a hint:
SELECT *
FROM userapplication FORCE INDEX (PRIMARY)
WHERE application_id > 1025
, though it would not necessarily be more efficient.
Mysql definitely considers a full table scan cheaper than using the index; you can however force to use your primary key as preferred index with:
mysql> EXPLAIN SELECT * FROM userapplication FORCE INDEX (PRIMARY) WHERE application_id > 10;
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | userapplication | range | PRIMARY | PRIMARY | 4 | NULL | 24 | Using where |
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------------+
Note that using "USE INDEX" instead of "FORCE INDEX" to only hint mysql on the index to use, mysql still prefers a full table scan:
mysql> EXPLAIN SELECT * FROM userapplication USE INDEX (PRIMARY) WHERE application_id > 10;
+----+-------------+-----------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | userapplication | ALL | PRIMARY | NULL | NULL | NULL | 34 | Using where |
+----+-------------+-----------------+------+---------------+------+---------+------+------+-------------+
If your WHERE is a "greater than" comparison, it probably returns quite a few entries (and can realistically return all of them), therefore full table scans are usually preferred.
It should be the case of just typing:
SELECT * FROM userapplication WHERE application_id > 1025;
As detailed at this link. According to that guide, it should work where the application_id is a numeric value, for non-numeric values, you should type:
SELECT * FROM userapplication WHERE application_id > '1025';
I don't think there's anything wrong with your SELECT, maybe it's a table configuration problem?