MySQL optimizing orderby date query - mysql

I did my best to solve the following two simple queries but for each 10 rows result set it scans the full table or at least 10K rows. Currently there are 20000 rows in books table.
ALTER TABLE books ADD INDEX search_INX (`book_status`, `is_reviewed`,`has_image`,`published_date`)
mysql> EXPLAIN SELECT book_id FROM books ORDER BY published_date DESC LIMIT 10;
+----+-------------+-------+-------+---------------+------------+---------+------+-------+-----------------------------+
| id | se ref |lect_type | table | type | possible_keys | key | key_len | rows | Extra |
+----+-------------+-------+-------+---------------+------------+---------+------+-------+-----------------------------+
| 1 | SIMPLE | books | index | NULL | search_INX | 11 | NULL | 20431 | Using index; Using filesort |
+----+-------------+-------+-------+---------------+------------+---------+------+-------+-----------------------------+
mysql> EXPLAIN SELECT book_id FROM books WHERE book_status='available' AND is_reviewed=true AND has_image=true ORDER BY published_date DESC LIMIT 10;
+----+-------------+-------+------+---------------+------------+---------+-------------------+-------+--------------------------+
| id | select_type | table | type ref || possible_keys | key | key_len | rows | Extra |
+----+-------------+-------+------+---------------+------------+---------+-------------------+-------+--------------------------+
| 1 | SIMPLE | books | ref | search_INX | search_INX | 3 | const,const,const | 10215 | Using where; Using index |
+----+-------------+-------+------+---------------+------------+---------+-------------------+-------+--------------------------+
mysql> EXPLAIN SELECT book_id FROM books WHERE book_status='available' AND is_reviewed=true AND has_image=true ORDER BY published_date DESC LIMIT 10\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: books
type: ref
possible_keys: search_INX
key: search_INX
key_len: 3
ref: const,const,const
rows: 10215
Extra: Using where; Using index
1 row in set (0.00 sec)
Create Table: CREATE TABLE `books` (
`book_id` int(10) unsigned NOT NULL auto_increment,
`has_image` bit(1) NOT NULL default '',
`is_reviewed` bit(1) NOT NULL default '\0',
`book_status` enum('available','out of stock','printing') NOT NULL default 'available',
`published_date` datetime NOT NULL,
PRIMARY KEY (`book_id`),
KEY `search_INX` (`is_reviewed`,`has_image`,`book_status`,`published_date`)
) ENGINE=InnoDB AUTO_INCREMENT=162605 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
Does any one have clue to how to solve this problem?

The cardinality of:
KEY `search_INX` (`is_reviewed`,`has_image`,`book_status`,`published_date`)
...is poor. If you were to put published_date at the front, it would speed up your query. Further, why are you indexing is_reviewed & has_image? Boolean columns cannot even be indexed in the likes of SQL Server as there is no point to doing so (again, cardinality). Either rearrange your key, or put a unique key on the column I mentioned.

Also, mysql in rows column show not the number of rows that was affected, but the approximate amount of rows that could be affected, excluding the LIMIT clause.

At a quick glance, the problem seems to be you're missing an index on published_date. Order by is using this column. Add this index and see what happens.

If you use the FORCE INDEX command, does that help?

I'm no expert on indexing, but can you create an index just on published_date as well as the index you have made on all four fields?
ALTER TABLE books DROP INDEX `search_INX`;
ALTER TABLE books ADD INDEX `published_INX` (`published_date`);

#Zerkms #jason I just thought of another way to solve this.
It is unorthodox, but would work. If your primary key was (publish_date, book_id) with DESC sort, you would easily be able to get the last 10 results. The query engine would scan the table, applying the where clause, until it found 10 results, then quit.
It would work great. Just add another index on book_id if you need to specifically query by book_id.
The reason this makes sense is the DB would naturally store books by date (InnoDB uses clustered indexes), which is exactly what you are trying to query.

Related

Performance drop upgrading from MySQL 5.7.33 to 8.0.31 - why did it stop using the Index Condition Pushdown Optimization?

I have a table like this (details elided for readability):
CREATE TABLE UserData (
id bigint NOT NULL AUTO_INCREMENT,
userId bigint NOT NULL DEFAULT '0', ...
c6 int NOT NULL DEFAULT '0', ...
hidden int NOT NULL DEFAULT '0', ...
c22 int NOT NULL DEFAULT '0', ...
PRIMARY KEY (id), ...
KEY userId_hidden_c6_c22_idx (userId,hidden,c6,c22), ...
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3
and was happily doing queries on it like this in MySQL 5.7:
mysql> select * from UserData use index (userId_hidden_c6_c22_idx) where (userId = 123 AND hidden = 0) order by id DESC limit 10 offset 0;
+----
| ...
+----
10 rows in set (0.03 sec)
However, in MySQL 8.0 these queries started doing this:
mysql> select * from UserData use index (userId_hidden_c6_c22_idx) where (userId = 123 AND hidden = 0) order by id DESC limit 10 offset 0;
+----
| ...
+----
10 rows in set (1.56 sec)
Explain shows the following, 5.7:
mysql> explain select * from UserData use index (userId_hidden_c6_c22_idx) where (userId = 123 AND hidden = 0) order by id DESC limit 10 offset 0;
+----+-------------+----------+------------+------+---------------+---------------+---------+-------------+-------+----------+---------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+---------------+---------+-------------+-------+----------+---------------------------------------+
| 1 | SIMPLE | UserData | NULL | ref | userId_hidden_c6_c22_idx | userId_hidden_c6_c22_idx | 12 | const,const | 78062 | 100.00 | Using index condition; Using filesort |
+----+-------------+----------+------------+------+---------------+---------------+---------+-------------+-------+----------+---------------------------------------+
8.0:
mysql> explain select * from UserData use index (userId_hidden_c6_c22_idx) where (userId = 123 AND hidden = 0) order by id DESC limit 10 offset 0;
+----+-------------+----------+------------+------+---------------+---------------+---------+-------------+-------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+---------------+---------+-------------+-------+----------+----------------+
| 1 | SIMPLE | UserData | NULL | ref | userId_hidden_c6_c22_idx | userId_hidden_c6_c22_idx | 12 | const,const | 79298 | 100.00 | Using filesort |
+----+-------------+----------+------------+------+---------------+---------------+---------+-------------+-------+----------+----------------+
The main difference seems to be that 5.7 is Using index condition; Using filesort and 8.0 is only Using filesort.
Why did 8.0 stop using the index condition, and how can I get it to start using it?
EDIT: Why did performance drop 10-100x with MySQL 8.0? It looks like it's because it stopped using the Index Condition Pushdown Optimization - how can I get it to start using it?
The table has ~150M rows in it, and that user has ~75k records, so I guess it could be a change in the size-based heuristics or whatever goes into the MySQL decision making?
In the EXPLAIN you show, the type column is ref and the key column names the index, which indicates it is using that index to optimize the lookup.
You are making an incorrect interpretation of what "index condition" means in the extra column. Admittedly it does sound like "using the index" versus not using the index if that note is absent.
The note about "index condition" is referring to Index Condition Pushdown, which is not related to using the index, but it's about delegating other conditions to be filtered at the storage engine level. Read about it here: https://dev.mysql.com/doc/refman/8.0/en/index-condition-pushdown-optimization.html
It's unfortunate that the notes reported by EXPLAIN are so difficult to understand. You really have to study a lot of documentation to understand how to read those notes.
This would be much faster in either version because it would stop after 10 rows. That is, the "filesort" would be avoided.
INDEX(userId, hidden, id)
This won't do "Using index" (aka "covering"), but neither did your attempts. That is different from "Using index condition" (aka "ICP", as you point out).
Try these to get more insight:
EXPLAIN FORMAT_JSON SELECT ...
EXPLAIN ANALYZE SELECT ...
(No, I cannot explain the regression.)

Why is my query checking 1000's of rows even though the table is indexed?

The table has around 20K rows and the following create code:
CREATE TABLE `inventory` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`TID` int(11) DEFAULT NULL,
`RID` int(11) DEFAULT NULL,
`CID` int(11) DEFAULT NULL,
`value` text COLLATE utf8_unicode_ci,
PRIMARY KEY (`ID`),
KEY `index_TID_CID_value` (`TID`,`CID`,`value`(25))
);
and this is the result of the explain query
mysql> explain select rowID from inventory where TID=4 and CID=28 and value=3290843588097;
+----+-------------+------------+------+------------------------+-----------------------+---------+-------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+------------------------+-----------------------+---------+-------------+------+-------------+
| 1 | SIMPLE | inventory | ref | index_TID_CID_value | index_TID_CID_value | 10 | const,const | 9181 | Using where |
+----+-------------+------------+------+------------------------+-----------------------+---------+-------------+------+-------------+
1 row in set (0.00 sec)
The combination of TID=4 and CID=28 has around 13K rows in the table.
My questions are:
Why is the explain result telling me that around 9k rows will be
examined to get the final result?
Why is the column ref showing only const,const since 3 columns are included in the multi column index shouldn't ref be const,const,const ?
Update 7 Oct 2016
Query:
select rowID from inventory where TID=4 and CID=28 and value=3290843588097;
I ran it about 10 times and took the times of the last five (they were the same)
No index - 0.02 seconds
Index (TID, CID) - 0.03 seconds
Index (TID, CID, value) - 0.00 seconds
Also the same explain query looks different today, how?? note the key len has changed to 88 and the ref has changed to const,const,const also the rows to examine have reduced to 2.
mysql> explain select rowID from inventory where TID=4 and CID=28 and value='3290843588097';
+----+-------------+-----------+------+----------------------+---------------------+---------+-------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+----------------------+---------------------+---------+-------------------+------+-------------+
| 1 | SIMPLE | inventory | ref | index_TID_CID_value | index_TID_CID_value | 88 | const,const,const | 2 | Using where |
+----+-------------+-----------+------+----------------------+---------------------+---------+-------------------+------+-------------+
1 row in set (0.04 sec)
To explicitly answer your questions.
The explain plan is giving you ~9k rows queried due to the fact that the engine needs to search through the index tree to find the row IDs that match your where-clause criteria. The index is going to produce a mapping of each possible combination of the index column values to a list of the rowIDs associated with that combination. In effect, the engine is searching those combinations to find the right one; this is done by scanning the combination, hence the ~9k amount.
Since your where-clause criteria involves all three of the index columns, the engine is optimizing the search by leveraging the index for the first two columns, and then short-circuiting the third column and getting all rowID results for that combination.
In your specific use-case, I'm assuming you want to optimize performance of the search. I would recommend that you create just an index on TID and CID (not value). The reason for this is that you currently only have 2 combinations of these values out of ~20k records. This means that using an index with just 2 columns, the engine will be able to almost immediately cut out half of the records when doing a search on all three values. (This is all assuming that this index will be applied to a table with a much larger dataset.) Since your metrics are based off a smaller dataset, you may not be seeing the order of magnitude of performance differences between using the index and not.

MySQL using different index depending on limit value with ORDER BY query

This is weird to me:
One table 'ACTIVITIES' with one index on ACTIVITY_DATE. The exact same query with different LIMIT value results in different execution plan.
Here it is:
mysql> explain select * from ACTIVITIES order by ACTIVITY_DATE desc limit 20
-> ;
+----+-------------+------------+-------+---------------+-------------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+-------------+---------+------+------+-------+
| 1 | SIMPLE | ACTIVITIES | index | NULL | ACTI_DATE_I | 4 | NULL | 20 | |
+----+-------------+------------+-------+---------------+-------------+---------+------+------+-------+
1 row in set (0.00 sec)
mysql> explain select * from ACTIVITIES order by ACTIVITY_DATE desc limit 150
-> ;
+----+-------------+------------+------+---------------+------+---------+------+-------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+-------+----------------+
| 1 | SIMPLE | ACTIVITIES | ALL | NULL | NULL | NULL | NULL | 10629 | Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+-------+----------------+
1 row in set (0.00 sec)
How come when I limit 150 it is not using the index? I mean, scanning 150 lines seems faster than scanning 10629 rows, right?
EDIT
The query uses the index till "limit 96" and starts filesort at "limit 97".
The table has nothing specific, even not a foreign key, here is the complete create table:
mysql> show create table ACTIVITIES\G
*************************** 1. row ***************************
Table: ACTIVITIES
Create Table: CREATE TABLE `ACTIVITIES` (
`ACTIVITY_ID` int(11) NOT NULL AUTO_INCREMENT,
`ACTIVITY_DATE` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`USER_KEY` varchar(50) NOT NULL,
`ITEM_KEY` varchar(50) NOT NULL,
`ACTIVITY_TYPE` varchar(1) NOT NULL,
`EXTRA` varchar(500) DEFAULT NULL,
`IS_VISIBLE` varchar(1) NOT NULL DEFAULT 'Y',
PRIMARY KEY (`ACTIVITY_ID`),
KEY `ACTI_USER_I` (`USER_KEY`,`ACTIVITY_DATE`),
KEY `ACTIVITY_ITEM_I` (`ITEM_KEY`,`ACTIVITY_DATE`),
KEY `ACTI_ITEM_TYPE_I` (`ITEM_KEY`,`ACTIVITY_TYPE`,`ACTIVITY_DATE`),
KEY `ACTI_DATE_I` (`ACTIVITY_DATE`)
) ENGINE=InnoDB AUTO_INCREMENT=10091 DEFAULT CHARSET=utf8 COMMENT='Logs activity'
1 row in set (0.00 sec)
mysql>
I also tried to run "ANALYSE TABLE ACTIVITIES" but that did not change a thing.
That's the way things go. Bear with me a minute...
The Optimizer would like to use an INDEX, in this case ACTI_DATE_I. But it does not want to use it if that would be slower.
Plan A: Use the index.
Reach into the BTree-structured index at the end (because of DESC)
Scan backward
For each row in the index, look up the corresponding row in the data. Note: The index has (ACTIVITY_DATE, ACTIVITY_ID) because the PRIMARY KEY is implicitly appended to any secondary key. To reach into the "data" using the PK (ACTIVITY_ID) is another BTree lookup, potentially random. Hence, it is potentially slow. (But not very slow in your case.)
This stops after LIMIT rows.
Plan B: Ignore the table
Scan the table, building a tmp table. (Likely to be in-memory.)
Sort the tmp table
Peel off LIMIT rows.
In your case (96 -- 1% of 10K) it is surprising that it picked the table scan. Normally, the cutoff is somewhere around 10%-30% of the number of rows in the table.
ANALYZE TABLE should have caused a recalculation of the statistics, which could have convinced it to go with the other Plan.
What version of MySQL are you using? (No, I don't know of any changes in this area.)
One thing you could try: OPTIMIZE TABLE ACTIVITIES; That will rebuild the table, thereby repacking the blocks and leading to potentially different statistics. If that helps, I would like to know it -- since I normally say "Optimize table is useless".

How to avoid Using Filesort in "!=" mysql

Please, help me!
How to optimize a query like:
SELECT idu
FROM `user`
WHERE `username`!='manager'
AND `username`!='user1#yahoo.com'
ORDER BY lastdate DESC
This is the explain:
explain SELECT idu FROM `user` WHERE `username`!='manager' AND `username`!='ser1#yahoo.com' order by lastdate DESC;
+----+-------------+-------+------+----------------------------+------+---------+------+--------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------------+------+---------+------+--------+-----------------------------+
| 1 | SIMPLE | user | ALL | username,username-lastdate | NULL | NULL | NULL | 208478 | Using where; Using filesort |
+----+-------------+-------+------+----------------------------+------+---------+------+--------+-----------------------------+
1 row in set (0.00 sec)
To avoid file sorting in a big database.
Since this query is just scanning all rows, you need an index on lastdate to avoid MySQL from having to order the results manually (using filesort, which isn't always to disk/temp table).
For super read performance, add the following multi-column "covering" index:
user(lastdate, username, idu)
A "covering" index would allow MySQL to just scan the index instead of the actual table data.
If using InnoDB and any of the above columns are your primary key, you don't need it in the index.

Even with seemingly correct indices and enough memory, query runs too long

I have a table with 3 million rows and 6 columns.
The table structure:
| Sample | CREATE TABLE `sample` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`FileMD5` varchar(32) NOT NULL,
`NoCsumMD5` varchar(32) NOT NULL,
`SectMD5` varchar(32) NOT NULL,
`SectNoResMD5` varchar(32) NOT NULL,
`ImpMD5` varchar(32) NOT NULL,
`Overlay` tinyint(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`ID`),
KEY `FileMD5` (`FileMD5`),
KEY `NoCsumMD5` (`NoCsumMD5`)
) ENGINE=InnoDB AUTO_INCREMENT=3073630 DEFAULT CHARSET=latin1 |
The temporary table values:
mysql> SHOW VARIABLES LIKE 'tmp_table_size';
+----------------+----------+
| Variable_name | Value |
+----------------+----------+
| tmp_table_size | 16777216 |
+----------------+----------+
1 row in set (0.00 sec)
mysql> SHOW VARIABLES LIKE 'max_heap_table_size';
+---------------------+----------+
| Variable_name | Value |
+---------------------+----------+
| max_heap_table_size | 16777216 |
+---------------------+----------+
1 row in set (0.00 sec)
My Query
mysql> explain SELECT NoCsumMD5,Count(FileMD5)
FROM Sample GROUP BY NoCsumMD5
HAVING Count(FileMD5) > 10 ORDER BY Count(FileMD5) Desc ;
+----+-------------+--------+-------+---------------+-----------+---------+------+---------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+-----------+---------+------+---------+---------------------------------+
| 1 | SIMPLE | Sample | index | NULL | NoCsumMD5 | 34 | NULL | 2928042 | Using temporary; Using filesort |
+----+-------------+--------+-------+---------------+-----------+---------+------+---------+---------------------------------+
How can I optimize this query. Even after 10 min, it generates no output.
I feel that I have indexed the right columns and given enough memory for temporary tables.
Since FileMD5 is not NULL in your table definition, the query can be simplified, and you will not need the composite index #brendan-long suggests (NoCsumMD5 index is enough):
SELECT NoCsumMD5, Count(*) as cnt
FROM Sample
GROUP BY NoCsumMD5
HAVING cnt > 10
ORDER BY cnt DESC;
I'm not sure if this will help, but MySQL can only use one index at a time, so it may be helpful to create an index over both FileMD5 and NoCsumMD5:
KEY `someName` (`NoCsumMD5`, `FileMD5`),
Here's some information on multiple column indexes:
MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.
The short version is that the order of the columns in the index matters, because MySQL can only use the index in that order (for example, in the index I gave above, it can test NoCsumMD5, then narrow the result down using FileMD5).
I'm not sure how much it will help in this query though, since all you care about is whether FileMD5 is NULL or not..