mysql single table SELECT query ORDER BY causes FILESORT - mysql

I looked through multiple similar posts trying to get input on how to redefine my index but can't figure this out. Every time i include the ORDER BY statement, it uses filesort to return the resultset.
Here's the table definition and query:
SELECT
`s`.`title`,
`s`.`price`,
`s`.`price_sale`
FROM `style` `s`
WHERE `s`.`isactive`=1 AND `s`.`department`='women'
ORDER
BY `s`.`ctime` DESC
CREATE TABLE IF NOT EXISTS `style` (
`id` mediumint(6) unsigned NOT NULL auto_increment,
`ctime` timestamp NOT NULL default CURRENT_TIMESTAMP,
`department` char(5) NOT NULL,
`isactive` tinyint(1) unsigned NOT NULL,
`price` float(8,2) unsigned NOT NULL,
`price_sale` float(8,2) unsigned NOT NULL,
`title` varchar(200) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_grid_default` (`isactive`,`department`,`ctime`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci AUTO_INCREMENT=47 ;
Also, here's the explain result set I get:
+----+-------------+-------+------+---------------+----------+---------+-------------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+----------+---------+-------------+------+-----------------------------+
| 1 | SIMPLE | s | ref | idx_grid | idx_grid | 6 | const,const | 3 | Using where; Using filesort |
+----+-------------+-------+------+---------------+----------+---------+-------------+------+-----------------------------+

Why does s.isactive not get used as an index?
MySQL (or any SQL for that matter) will not use a key if it has low cardinality.
In plain English, if many rows share the same value for a key, (My)SQL will not use the index, but just real the table instead.
A boolean field almost never gets picked as an index because of this; too many rows share the same value.
Why does MySQL not use the index on ctime?
ctime is included in a multi-field or composite index. MySQL will only use a composite index if you use all of it or a left-most part of it *)
If you sort on the middle or rightmost field(s) of a composite index, MySQL cannot use the index and will have to resort to filesort.
So a order by isactive , department will use an index;
order by department will not.
order by isactive will also not use an index, but that's because the cardinality of the boolean field isactive is too low.
*) there are some exceptions, but this covers 97% of cases.
Links:
Cardinality wikipedia: http://en.wikipedia.org/wiki/Cardinality_%28data_modeling%29
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

What does Using filesort mean in MySQL?
It does not mean you have a temporary file, it just mean a sort is done (bad name, ignore the 4 first letters).
from Baron Schwartz:
The truth is, filesort is badly named. Anytime a sort can’t be performed from an index, it’s a filesort. It has nothing to do with files. Filesort should be called “sort.” It is quicksort at heart.

Related

MySQL Date Range Query Optimization

I have a MySQL table structured like this:
CREATE TABLE `messages` (
`id` int NOT NULL AUTO_INCREMENT,
`author` varchar(250) COLLATE utf8mb4_unicode_ci NOT NULL,
`message` varchar(2000) COLLATE utf8mb4_unicode_ci NOT NULL,
`serverid` varchar(200) COLLATE utf8mb4_unicode_ci NOT NULL,
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`guildname` varchar(1000) COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`,`date`)
) ENGINE=InnoDB AUTO_INCREMENT=27769461 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
I need to query this table for various statistics using date ranges for Grafana graphs, however all of those queries are extremely slow, despite the table being indexed using a composite key of id and date.
"id" is auto-incrementing and date is also always increasing.
The queries generated by Grafana look like this:
SELECT
UNIX_TIMESTAMP(date) DIV 120 * 120 AS "time",
count(DISTINCT(serverid)) AS "servercount"
FROM messages
WHERE
date BETWEEN FROM_UNIXTIME(1615930154) AND FROM_UNIXTIME(1616016554)
GROUP BY 1
ORDER BY UNIX_TIMESTAMP(date) DIV 120 * 120
This query takes over 30 seconds to complete with 27 million records in the table.
Explaining the query results in this output:
+----+-------------+----------+------------+------+---------------+------+---------+------+----------+----------+-----------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+----------+----------+-----------------------------+
| 1 | SIMPLE | messages | NULL | ALL | PRIMARY | NULL | NULL | NULL | 26952821 | 11.11 | Using where; Using filesort |
+----+-------------+----------+------------+------+---------------+------+---------+------+----------+----------+-----------------------------+
This indicates that MySQL is indeed using the composite primary key I created for indexing the data, but still has to scan almost the entire table, which I do not understand. How can I optimize this table for date range queries?
Plan A:
PRIMARY KEY(date, id), -- to cluster by date
INDEX(id) -- needed to keep AUTO_INCREMENT happy
Assiming the table is quite big, having date at the beginning of the PK puts the rows in the given date range all next to each other. This minimizes (somewhat) the I/O.
Plan B:
PRIMARY KEY(id),
INDEX(date, serverid)
Now the secondary index is exactly what is needed for the one query you have provided. It is optimized for searching by date, and it is smaller than the whole table, hence even faster (I/O-wise) than Plan A.
But, if you have a lot of different queries like this, adding a lot more indexes gets impractical.
Plan C: There may be a still better way:
PRIMARY KEY(id),
INDEX(server_id, date)
In theory, it can hop through that secondary index checking each server_id. But I am not sure that such an optimization exists.
Plan D: Do you need id for anything other than providing a unique PRIMARY KEY? If not, there may be other options.
The index on (id, date) doesn't help because the first key is id not date.
You can either
(a) drop the current index and index (date, id) instead -- when date is in the first place this can be used to filter for date regardless of the following columns -- or
(b) just create an additional index only on (date) to support the query.

Why would an indexed column return results slowly when querying for `IS NULL`?

I have a table with 25 million rows, indexed appropriately.
But adding the clause AND status IS NULL turns a super fast query into a crazy slow query.
Please help me speed it up.
Query:
SELECT
student_id,
grade,
status
FROM
grades
WHERE
class_id = 1
AND status IS NULL -- This line delays results from <200ms to 40-70s!
AND grade BETWEEN 0 AND 0.7
LIMIT 25;
Table:
CREATE TABLE IF NOT EXISTS `grades` (
`student_id` BIGINT(20) NOT NULL,
`class_id` INT(11) NOT NULL,
`grade` FLOAT(10,6) DEFAULT NULL,
`status` INT(11) DEFAULT NULL,
UNIQUE KEY `unique_key` (`student_id`,`class_id`),
KEY `class_id` (`class_id`),
KEY `status` (`status`),
KEY `grade` (`grade`)
) ENGINE=INNODB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Local development shows results instantly (<200ms). Production server is huge slowdown (40-70 seconds!).
Can you point me in the right direction to debug?
Explain:
+----+-------------+--------+-------------+-----------------------+-----------------+---------+------+-------+--------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------------+-----------------------+-----------------+---------+------+-------+--------------------------------------------------------+
| 1 | SIMPLE | grades | index_merge | class_id,status,grade | status,class_id | 5,4 | NULL | 26811 | Using intersect(status,class_id); Using where |
+----+-------------+--------+-------------+-----------------------+-----------------+---------+------+-------+--------------------------------------------------------+
A SELECT statement can only use one index per table.
Presumably the query before just did a scan using the sole index class_id for your condition class_id=1. Which will probably filter your result set nicely before checking the other conditions.
The optimiser is 'incorrectly' choosing an index merge on class_id and status for the second query and checking 26811 rows which is probably not optimal. You could hint at the class_id index by adding USING INDEX (class_id) to the end of the FROM clause.
You may get some joy with a composite index on (class_id,status,grade) which may run the query faster as it can match the first two and then range scan the grade. I'm not sure how this works with null though.
I'm guessing the ORDER BY pushed the optimiser to choose the class_id index again and returned your query to it's original speed.

MySQL uses filesort on indexed TIMESTAMP column

I've got a table that refuses to use index, and it always uses filesort.
The table is:
CREATE TABLE `article` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`Category_ID` int(11) DEFAULT NULL,
`Subcategory` int(11) DEFAULT NULL,
`CTimestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`Publish` tinyint(4) DEFAULT NULL,
`Administrator_ID` int(11) DEFAULT NULL,
`Position` tinyint(4) DEFAULT '0',
PRIMARY KEY (`ID`),
KEY `Subcategory` (`Subcategory`,`Position`,`CTimestamp`,`Publish`),
KEY `Category_ID` (`Category_ID`,`CTimestamp`,`Publish`),
KEY `Position` (`Position`,`Category_ID`,`Publish`),
KEY `CTimestamp` (`CTimestamp`),
CONSTRAINT `article_ibfk_1` FOREIGN KEY (`Category_ID`) REFERENCES `category` (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=94290 DEFAULT CHARSET=utf8
The query is:
SELECT * FROM article ORDER BY `CTimestamp`;
The explain is:
+----+-------------+---------+------+---------------+------+---------+------+-------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+-------+----------------+
| 1 | SIMPLE | article | ALL | NULL | NULL | NULL | NULL | 63568 | Using filesort |
+----+-------------+---------+------+---------------+------+---------+------+-------+----------------+
When I remove the "ORDER BY" then all are working properly. All other indices (Subcategory, Position, etc) are working fine in other queries. Unfortunately, the timestamp refuses to be used, even with my simple select query. I'm sure I'm missing something important here.
How can I make MySQL use the timestamp index?
Thank you.
In this case, MySQL is not using your index for sorting, and it is a GOOD thing.
Why? Your table contains just 64k rows, average row width is about 26 bytes (if I added column sizes right), so total table size on disk should be around 2MB.
It is very cheap to read just 2MB of data from disk into memory (probably in just 1-2 disk operations or seeks) and then simply perform filesort in memory (probably variation of quicksort).
If MySQL did retrieval by index order as you wish, it would have to perform 64000 disk seek operations, one record after another! It would have been very, very slow.
Indexes can be good when you can use them to quickly jump to known location in huge file and read just small amount of data, like in WHERE clause. But, in this case, it is not good idea - and MySQL is not stupid!
If your table was very big (more than RAM size), then MySQL would certainly start using your index - and this is also good thing.
Well, you can always hint the index. Change your query to
SELECT * FROM article use index (CTimestamp);
This forces MySQL to use the index for the query. The EXPLAIN:
1, 'SIMPLE', 'article', 'ALL', '', '', '', '', 1, 100.00, ''
No filesort to see, and as the used index is CTimestamp, the result should be ordered accordingly.
Alternatively, you can keep your order by clause, but force the index usage:
SELECT * FROM article force index (CTimestamp) order by CTimestamp;
The problem is still strange, though. Have you considered posting it to the official MySQL help forums?
Edit: You seem to be in good company.
Edit: Forcing the index seems to work out well.

MySQL 5.1 using filesort event when an index is present

Probably I'm missing some silly thing... Apparently MySQL 5.1 keeps doing a Filesort even when there is an index that matches exactly the column in the ORDER BY clause. To post it here, I've oversimplified the data model, but the issue is still happening:
Table definition:
CREATE TABLE `event` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`owner_id` int(11) DEFAULT NULL,
`date_created` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `owner_id` (`owner_id`),
KEY `date_created` (`date_created`),
CONSTRAINT `event_ibfk_1` FOREIGN KEY (`owner_id`) REFERENCES `user_profile` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8;
My problem is that event a simple SELECT is showing "Using filesort":
explain select * from event order by date_created desc;
And the result for the query explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE event ALL NULL NULL NULL NULL 6 Using filesort
Is there any way for this type of queries to use the index insteas of doing a filesort?
Thanks in advance to everybody.
Since your CREATE TABLE statement indicates that you have less than 10 rows (AUTO_INCREMENT=7) and using FORCE INDEX on my installation will make MySQL use the index, I'm guessing the optimizer thinks a table scan is faster (less random I/O) than an index scan (since you're selecting all columns, not just date_created). This is confirmed by the following:
mysql> explain select date_created from event order by date_created;
+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+
| 1 | SIMPLE | event | index | NULL | date_created | 9 | NULL | 1 | Using index |
+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+
1 row in set (0.00 sec)
In the above case, the index scan is faster because only the indexed column needs to be returned.
The MySQL documentation has some cases where using an index is considered slower: http://dev.mysql.com/doc/refman/5.1/en/how-to-avoid-table-scan.html
Q: Is there any way for this type of queries to use the index instead of doing a filesort?
A: To have MySQL use the index if at all possible, try:
EXPLAIN SELECT * FROM event FORCE INDEX (date_created) ORDER BY date_created DESC;
By using the FORCE INDEX (index_name), this tells MySQL to make use of the index if it's at all possible. Absent that directive, MySQL will choose the most efficient way to return the result set. A filesort may be more efficient than using the index.

How to optimize a query that's using group by on a large number of rows

The table looks like this:
CREATE TABLE `tweet_tweet` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`text` varchar(256) NOT NULL,
`created_at` datetime NOT NULL,
`created_date` date NOT NULL,
...
`positive_sentiment` decimal(5,2) DEFAULT NULL,
`negative_sentiment` decimal(5,2) DEFAULT NULL,
`entity_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `tweet_tweet_entity_created` (`entity_id`,`created_at`)
) ENGINE=MyISAM AUTO_INCREMENT=1097134 DEFAULT CHARSET=utf8
The explain on the query looks like this:
mysql> explain SELECT `tweet_tweet`.`entity_id`,
STDDEV_POP(`tweet_tweet`.`positive_sentiment`) AS `sentiment_stddev`,
AVG(`tweet_tweet`.`positive_sentiment`) AS `sentiment_avg`,
COUNT(`tweet_tweet`.`id`) AS `tweet_count`
FROM `tweet_tweet`
WHERE `tweet_tweet`.`created_at` > '2010-10-06 16:24:43'
GROUP BY `tweet_tweet`.`entity_id` ORDER BY `tweet_tweet`.`entity_id` ASC;
+----+-------------+-------------+------+---------------+------+---------+------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+---------------+------+---------+------+---------+----------------------------------------------+
| 1 | SIMPLE | tweet_tweet | ALL | NULL | NULL | NULL | NULL | 1097452 | Using where; Using temporary; Using filesort |
+----+-------------+-------------+------+---------------+------+---------+------+---------+----------------------------------------------+
1 row in set (0.00 sec)
About 300k rows are added to the table every day. The query runs about 4 seconds right now but I want to get it down to around 1 second and I'm afraid the query will take exponentially longer as the days go on. Total number of rows in tweet_tweet is currently only a little over 1M, but it will be growing fast.
Any thoughts on optimizing this? Do I need any more indexes? Should I be using something like Cassandra instead of MySQL? =)
You may try to reorder fields in the index (i.e. KEY tweet_tweet_entity_created (created_at, entity_id). That will allow mysql to use the index to reduce the quantity of actual rows that need to be grouped and ordered).
You're not using the index tweet_tweet_entity_created. Change your query to:
explain SELECT `tweet_tweet`.`entity_id`,
STDDEV_POP(`tweet_tweet`.`positive_sentiment`) AS `sentiment_stddev`,
AVG(`tweet_tweet`.`positive_sentiment`) AS `sentiment_avg`,
COUNT(`tweet_tweet`.`id`) AS `tweet_count`
FROM `tweet_tweet` FORCE INDEX (tweet_tweet_entity_created)
WHERE `tweet_tweet`.`created_at` > '2010-10-06 16:24:43'
GROUP BY `tweet_tweet`.`entity_id` ORDER BY `tweet_tweet`.`entity_id` ASC;
You can read more about index hints in the MySQL manual http://dev.mysql.com/doc/refman/5.1/en/index-hints.html
Sometimes MySQL's query optimizer needs a little help.
MySQL has a dirty little secret. When you create an index over multiple columns, only the first one is really "used". I've made tables that used Unique Keys and Foreign Keys, and I often had to set a separate index for one or more of the columns.
I suggest adding an extra index to just created_at at a minimum. I do not know if adding indexes to the aggregate columns will also speed things up.
if your mysql version 5.1 or higher ,you can consider partitioning option for large tables.
http://dev.mysql.com/doc/refman/5.1/en/partitioning.html