This is weird to me:
One table 'ACTIVITIES' with one index on ACTIVITY_DATE. The exact same query with different LIMIT value results in different execution plan.
Here it is:
mysql> explain select * from ACTIVITIES order by ACTIVITY_DATE desc limit 20
-> ;
+----+-------------+------------+-------+---------------+-------------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+-------------+---------+------+------+-------+
| 1 | SIMPLE | ACTIVITIES | index | NULL | ACTI_DATE_I | 4 | NULL | 20 | |
+----+-------------+------------+-------+---------------+-------------+---------+------+------+-------+
1 row in set (0.00 sec)
mysql> explain select * from ACTIVITIES order by ACTIVITY_DATE desc limit 150
-> ;
+----+-------------+------------+------+---------------+------+---------+------+-------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+-------+----------------+
| 1 | SIMPLE | ACTIVITIES | ALL | NULL | NULL | NULL | NULL | 10629 | Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+-------+----------------+
1 row in set (0.00 sec)
How come when I limit 150 it is not using the index? I mean, scanning 150 lines seems faster than scanning 10629 rows, right?
EDIT
The query uses the index till "limit 96" and starts filesort at "limit 97".
The table has nothing specific, even not a foreign key, here is the complete create table:
mysql> show create table ACTIVITIES\G
*************************** 1. row ***************************
Table: ACTIVITIES
Create Table: CREATE TABLE `ACTIVITIES` (
`ACTIVITY_ID` int(11) NOT NULL AUTO_INCREMENT,
`ACTIVITY_DATE` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`USER_KEY` varchar(50) NOT NULL,
`ITEM_KEY` varchar(50) NOT NULL,
`ACTIVITY_TYPE` varchar(1) NOT NULL,
`EXTRA` varchar(500) DEFAULT NULL,
`IS_VISIBLE` varchar(1) NOT NULL DEFAULT 'Y',
PRIMARY KEY (`ACTIVITY_ID`),
KEY `ACTI_USER_I` (`USER_KEY`,`ACTIVITY_DATE`),
KEY `ACTIVITY_ITEM_I` (`ITEM_KEY`,`ACTIVITY_DATE`),
KEY `ACTI_ITEM_TYPE_I` (`ITEM_KEY`,`ACTIVITY_TYPE`,`ACTIVITY_DATE`),
KEY `ACTI_DATE_I` (`ACTIVITY_DATE`)
) ENGINE=InnoDB AUTO_INCREMENT=10091 DEFAULT CHARSET=utf8 COMMENT='Logs activity'
1 row in set (0.00 sec)
mysql>
I also tried to run "ANALYSE TABLE ACTIVITIES" but that did not change a thing.
That's the way things go. Bear with me a minute...
The Optimizer would like to use an INDEX, in this case ACTI_DATE_I. But it does not want to use it if that would be slower.
Plan A: Use the index.
Reach into the BTree-structured index at the end (because of DESC)
Scan backward
For each row in the index, look up the corresponding row in the data. Note: The index has (ACTIVITY_DATE, ACTIVITY_ID) because the PRIMARY KEY is implicitly appended to any secondary key. To reach into the "data" using the PK (ACTIVITY_ID) is another BTree lookup, potentially random. Hence, it is potentially slow. (But not very slow in your case.)
This stops after LIMIT rows.
Plan B: Ignore the table
Scan the table, building a tmp table. (Likely to be in-memory.)
Sort the tmp table
Peel off LIMIT rows.
In your case (96 -- 1% of 10K) it is surprising that it picked the table scan. Normally, the cutoff is somewhere around 10%-30% of the number of rows in the table.
ANALYZE TABLE should have caused a recalculation of the statistics, which could have convinced it to go with the other Plan.
What version of MySQL are you using? (No, I don't know of any changes in this area.)
One thing you could try: OPTIMIZE TABLE ACTIVITIES; That will rebuild the table, thereby repacking the blocks and leading to potentially different statistics. If that helps, I would like to know it -- since I normally say "Optimize table is useless".
Related
there is the table test :
show create table test;
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`body` longtext NOT NULL,
`timestamp` int(11) NOT NULL,
`handle_after` datetime NOT NULL,
`status` varchar(100) NOT NULL,
`queue_id` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
KEY `idxTimestampStatus` (`timestamp`,`status`),
KEY `idxTimestampStatus2` (`status`,`timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=80000 DEFAULT CHARSET=utf8
there is two select's
1) select * from test where status = 'in_queue' and timestamp > 1625721850;
2) select id from test where status = 'in_queue' and timestamp > 1625721850;
in the first select explain show me that no indexes are used
in the second select index idxTimestampStatus is used.
MariaDB [db]> explain select * from test where status = 'in_queue' and timestamp > 1625721850;
+------+-------------+-------+------+----------------------------------------+------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+----------------------------------------+------+---------+------+----------+-------------+
| 1 | SIMPLE | test | ALL | idxTimestampStatus,idxTimestampStatus2 | NULL | NULL | NULL | 80000 | Using where |
+------+-------------+-------+------+----------------------------------------+------+---------+------+----------+-------------+
MariaDB [db]> explain select id from test where status = 'in_queue' and timestamp > 1625721850;
+------+-------------+-------+------+----------------------------------------+---------------------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+----------------------------------------+---------------------+---------+-------+------+--------------------------+
| 1 | SIMPLE | test | ref | idxTimestampStatus,idxTimestampStatus2 | idxTimestampStatus2 | 302 | const | 4 | Using where; Using index |
+------+-------------+-------+------+----------------------------------------+---------------------+---------+-------+------+--------------------------+
Help me figure out what i'm doing wrong ?
How should i create index for first select?
why does the number of columns affect the index usage ?
What you saw is to be expected. (The "number of columns" did not cause what you saw.) Read all the points below; various combinations of them should address all the issues raised in both the Question and Comments.
Deciding between index and table scan:
The Optimizer uses statistics to decide between using an index and doing a full table scan.
If less than (about) 20% of the rows need to be fetched, the index will be used. This involves bouncing back and forth between the index's BTree and the data's BTree.
If more of the table is needed, then it is deemed more efficient to simply scan the table, ignoring any rows that don't match the WHERE.
The "20%" is not a hard-and-fast number.
SELECT id ... status ... timestamp;
In InnoDB, a secondary index implicitly includes the columns of the PRIMARY KEY.
If all the columns mentioned in the query are in an index, then that index is "covering". This means that all the work can be done in the index's BTree without touching the data's BTree.
Using index == "covering". (That is, EXPLAIN gives this clue.)
"Covering" overrides the "20%" discussion.
SELECT * ... status ... timestamp;
SELECT * needs to fetch all columns, so "covering" does not apply and the "20%" becomes relevant.
If 1625721850 were a larger number, the EXPLAIN would switch from ALL to Index.
idxTimestampStatus2 (status,timestamp)
The order of the clauses in WHERE does not matter.
The order of the columns in a "composite" index is important. ("Composite" == multi-column)
Put the = column(s) first, then one "range" (eg >) column.
More discussion: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
how can this sql use index and how can this sql not use index.
CREATE TABLE `testtable` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`a` int(11) NOT NULL,
`b` int(11) NOT NULL,
`c` int(11) NOT NULL,
`d` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_abd` (`a`,`b`,`d`)
) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=utf8;
explain select * from testtable where a > 1;
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | testtable | NULL | ALL | idx_abd | NULL | NULL | NULL | 10 | 80.00 | Using where |
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+
explain select * from testtable where a < 1;
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | testtable | NULL | range | idx_abd | idx_abd | 4 | NULL | 1 | 100.00 | Using index condition |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------+
why the first one can't use index but the second use index.
how the index works inside?
In first case, MySQL optimizer (based on statistics) decided that it is better to do a Full Table Scan, instead of first doing Index Lookups, and then do a Data Lookup.
In the first query of yours, the condition used (a > 1) is effectively needing to access 10 out of 11 rows. Always remember that, MySQL does Cost based optimization (tries to minimize the cost). The process is basically:
Assign a cost to each operation.
Evaluate how many operations each possible plan would take.
Sum up the total.
Choose the plan with the lowest overall cost.
Now, default MySQL cost for io_block_read_cost is 1. In the first query, you are going to roughly have two times the I/O block reads (first for index lookups and then Data lookups). So, the cost would come out roughly as 20, in case MySQL decides to use the index. Instead, if it does the Table Scan directly, the cost would be roughly 11 (Data lookup on all the rows). That is why, it decided to use Table Scan instead of Range based Index Scan.
If you want to get details about the Cost breakup, please run each of this queries by appending EXPLAIN format=JSON to them and executing them, like below:
EXPLAIN format=JSON select * from testtable where a > 1;
You can also see how Optimizer compared various plans before locking into a particular strategy. To do this, execute the queries below:
/* Turn tracing on (it's off by default): */
SET optimizer_trace="enabled=on";
SELECT * FROM testtable WHERE a > 1; /* your query here */
SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;
/* possibly more queries...
When done with tracing, disable it: */
SET optimizer_trace="enabled=off";
Check more details at MySQL documentation: https://dev.mysql.com/doc/internals/en/optimizer-tracing.html
The alternative is to read the both the index and the data pages. On such small data, that can be less efficient (although the difference in performance -- like the duration of each query -- is quite small).
Your table has 10 rows, which presumably are all on a single data page. MySQL considers it more efficient to just read the 10 rows directly and do the comparison.
The value of indexes is when you have larger tables, particularly tables that span many data pages. One primary use is to reduce the number of data pages being read.
The table has around 20K rows and the following create code:
CREATE TABLE `inventory` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`TID` int(11) DEFAULT NULL,
`RID` int(11) DEFAULT NULL,
`CID` int(11) DEFAULT NULL,
`value` text COLLATE utf8_unicode_ci,
PRIMARY KEY (`ID`),
KEY `index_TID_CID_value` (`TID`,`CID`,`value`(25))
);
and this is the result of the explain query
mysql> explain select rowID from inventory where TID=4 and CID=28 and value=3290843588097;
+----+-------------+------------+------+------------------------+-----------------------+---------+-------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+------------------------+-----------------------+---------+-------------+------+-------------+
| 1 | SIMPLE | inventory | ref | index_TID_CID_value | index_TID_CID_value | 10 | const,const | 9181 | Using where |
+----+-------------+------------+------+------------------------+-----------------------+---------+-------------+------+-------------+
1 row in set (0.00 sec)
The combination of TID=4 and CID=28 has around 13K rows in the table.
My questions are:
Why is the explain result telling me that around 9k rows will be
examined to get the final result?
Why is the column ref showing only const,const since 3 columns are included in the multi column index shouldn't ref be const,const,const ?
Update 7 Oct 2016
Query:
select rowID from inventory where TID=4 and CID=28 and value=3290843588097;
I ran it about 10 times and took the times of the last five (they were the same)
No index - 0.02 seconds
Index (TID, CID) - 0.03 seconds
Index (TID, CID, value) - 0.00 seconds
Also the same explain query looks different today, how?? note the key len has changed to 88 and the ref has changed to const,const,const also the rows to examine have reduced to 2.
mysql> explain select rowID from inventory where TID=4 and CID=28 and value='3290843588097';
+----+-------------+-----------+------+----------------------+---------------------+---------+-------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+----------------------+---------------------+---------+-------------------+------+-------------+
| 1 | SIMPLE | inventory | ref | index_TID_CID_value | index_TID_CID_value | 88 | const,const,const | 2 | Using where |
+----+-------------+-----------+------+----------------------+---------------------+---------+-------------------+------+-------------+
1 row in set (0.04 sec)
To explicitly answer your questions.
The explain plan is giving you ~9k rows queried due to the fact that the engine needs to search through the index tree to find the row IDs that match your where-clause criteria. The index is going to produce a mapping of each possible combination of the index column values to a list of the rowIDs associated with that combination. In effect, the engine is searching those combinations to find the right one; this is done by scanning the combination, hence the ~9k amount.
Since your where-clause criteria involves all three of the index columns, the engine is optimizing the search by leveraging the index for the first two columns, and then short-circuiting the third column and getting all rowID results for that combination.
In your specific use-case, I'm assuming you want to optimize performance of the search. I would recommend that you create just an index on TID and CID (not value). The reason for this is that you currently only have 2 combinations of these values out of ~20k records. This means that using an index with just 2 columns, the engine will be able to almost immediately cut out half of the records when doing a search on all three values. (This is all assuming that this index will be applied to a table with a much larger dataset.) Since your metrics are based off a smaller dataset, you may not be seeing the order of magnitude of performance differences between using the index and not.
I have a table with 3 million rows and 6 columns.
The table structure:
| Sample | CREATE TABLE `sample` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`FileMD5` varchar(32) NOT NULL,
`NoCsumMD5` varchar(32) NOT NULL,
`SectMD5` varchar(32) NOT NULL,
`SectNoResMD5` varchar(32) NOT NULL,
`ImpMD5` varchar(32) NOT NULL,
`Overlay` tinyint(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`ID`),
KEY `FileMD5` (`FileMD5`),
KEY `NoCsumMD5` (`NoCsumMD5`)
) ENGINE=InnoDB AUTO_INCREMENT=3073630 DEFAULT CHARSET=latin1 |
The temporary table values:
mysql> SHOW VARIABLES LIKE 'tmp_table_size';
+----------------+----------+
| Variable_name | Value |
+----------------+----------+
| tmp_table_size | 16777216 |
+----------------+----------+
1 row in set (0.00 sec)
mysql> SHOW VARIABLES LIKE 'max_heap_table_size';
+---------------------+----------+
| Variable_name | Value |
+---------------------+----------+
| max_heap_table_size | 16777216 |
+---------------------+----------+
1 row in set (0.00 sec)
My Query
mysql> explain SELECT NoCsumMD5,Count(FileMD5)
FROM Sample GROUP BY NoCsumMD5
HAVING Count(FileMD5) > 10 ORDER BY Count(FileMD5) Desc ;
+----+-------------+--------+-------+---------------+-----------+---------+------+---------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+-----------+---------+------+---------+---------------------------------+
| 1 | SIMPLE | Sample | index | NULL | NoCsumMD5 | 34 | NULL | 2928042 | Using temporary; Using filesort |
+----+-------------+--------+-------+---------------+-----------+---------+------+---------+---------------------------------+
How can I optimize this query. Even after 10 min, it generates no output.
I feel that I have indexed the right columns and given enough memory for temporary tables.
Since FileMD5 is not NULL in your table definition, the query can be simplified, and you will not need the composite index #brendan-long suggests (NoCsumMD5 index is enough):
SELECT NoCsumMD5, Count(*) as cnt
FROM Sample
GROUP BY NoCsumMD5
HAVING cnt > 10
ORDER BY cnt DESC;
I'm not sure if this will help, but MySQL can only use one index at a time, so it may be helpful to create an index over both FileMD5 and NoCsumMD5:
KEY `someName` (`NoCsumMD5`, `FileMD5`),
Here's some information on multiple column indexes:
MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.
The short version is that the order of the columns in the index matters, because MySQL can only use the index in that order (for example, in the index I gave above, it can test NoCsumMD5, then narrow the result down using FileMD5).
I'm not sure how much it will help in this query though, since all you care about is whether FileMD5 is NULL or not..
I did my best to solve the following two simple queries but for each 10 rows result set it scans the full table or at least 10K rows. Currently there are 20000 rows in books table.
ALTER TABLE books ADD INDEX search_INX (`book_status`, `is_reviewed`,`has_image`,`published_date`)
mysql> EXPLAIN SELECT book_id FROM books ORDER BY published_date DESC LIMIT 10;
+----+-------------+-------+-------+---------------+------------+---------+------+-------+-----------------------------+
| id | se ref |lect_type | table | type | possible_keys | key | key_len | rows | Extra |
+----+-------------+-------+-------+---------------+------------+---------+------+-------+-----------------------------+
| 1 | SIMPLE | books | index | NULL | search_INX | 11 | NULL | 20431 | Using index; Using filesort |
+----+-------------+-------+-------+---------------+------------+---------+------+-------+-----------------------------+
mysql> EXPLAIN SELECT book_id FROM books WHERE book_status='available' AND is_reviewed=true AND has_image=true ORDER BY published_date DESC LIMIT 10;
+----+-------------+-------+------+---------------+------------+---------+-------------------+-------+--------------------------+
| id | select_type | table | type ref || possible_keys | key | key_len | rows | Extra |
+----+-------------+-------+------+---------------+------------+---------+-------------------+-------+--------------------------+
| 1 | SIMPLE | books | ref | search_INX | search_INX | 3 | const,const,const | 10215 | Using where; Using index |
+----+-------------+-------+------+---------------+------------+---------+-------------------+-------+--------------------------+
mysql> EXPLAIN SELECT book_id FROM books WHERE book_status='available' AND is_reviewed=true AND has_image=true ORDER BY published_date DESC LIMIT 10\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: books
type: ref
possible_keys: search_INX
key: search_INX
key_len: 3
ref: const,const,const
rows: 10215
Extra: Using where; Using index
1 row in set (0.00 sec)
Create Table: CREATE TABLE `books` (
`book_id` int(10) unsigned NOT NULL auto_increment,
`has_image` bit(1) NOT NULL default '',
`is_reviewed` bit(1) NOT NULL default '\0',
`book_status` enum('available','out of stock','printing') NOT NULL default 'available',
`published_date` datetime NOT NULL,
PRIMARY KEY (`book_id`),
KEY `search_INX` (`is_reviewed`,`has_image`,`book_status`,`published_date`)
) ENGINE=InnoDB AUTO_INCREMENT=162605 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
Does any one have clue to how to solve this problem?
The cardinality of:
KEY `search_INX` (`is_reviewed`,`has_image`,`book_status`,`published_date`)
...is poor. If you were to put published_date at the front, it would speed up your query. Further, why are you indexing is_reviewed & has_image? Boolean columns cannot even be indexed in the likes of SQL Server as there is no point to doing so (again, cardinality). Either rearrange your key, or put a unique key on the column I mentioned.
Also, mysql in rows column show not the number of rows that was affected, but the approximate amount of rows that could be affected, excluding the LIMIT clause.
At a quick glance, the problem seems to be you're missing an index on published_date. Order by is using this column. Add this index and see what happens.
If you use the FORCE INDEX command, does that help?
I'm no expert on indexing, but can you create an index just on published_date as well as the index you have made on all four fields?
ALTER TABLE books DROP INDEX `search_INX`;
ALTER TABLE books ADD INDEX `published_INX` (`published_date`);
#Zerkms #jason I just thought of another way to solve this.
It is unorthodox, but would work. If your primary key was (publish_date, book_id) with DESC sort, you would easily be able to get the last 10 results. The query engine would scan the table, applying the where clause, until it found 10 results, then quit.
It would work great. Just add another index on book_id if you need to specifically query by book_id.
The reason this makes sense is the DB would naturally store books by date (InnoDB uses clustered indexes), which is exactly what you are trying to query.