Optimize indexes to avoid using filesorting - mysql

Please help me to choose index for tables to avoid filesorting which occurs running particular query.
So, there are two tables demo_user and demo_question:
CREATE TABLE `demo_user` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR(50) NOT NULL,
`age` INT(11) NOT NULL,
PRIMARY KEY (`id`),
INDEX `age` (`age`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB;
CREATE TABLE `demo_question` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`userId` INT(11) NOT NULL,
`createdAt` DATETIME NOT NULL,
`question` VARCHAR(50) NOT NULL,
PRIMARY KEY (`id`),
INDEX `userId` (`userId`),
INDEX `createdAt` (`createdAt`),
CONSTRAINT `FK_demo_question_demo_user` FOREIGN KEY (`userId`) REFERENCES `demo_user` (`id`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB;
Some sample data:
INSERT INTO `demo_user` VALUES ('u1', 20);
INSERT INTO `demo_user` VALUES ('u2', 25);
INSERT INTO `demo_user` VALUES ('u3', 27);
INSERT INTO `demo_user` VALUES ('u4', 33);
INSERT INTO `demo_user` VALUES ('u5', 19);
INSERT INTO `demo_question` VALUES (2, '2014-01-19 15:17:13', 'q1');
INSERT INTO `demo_question` VALUES (3, '2014-01-19 15:17:43', 'q2');
INSERT INTO `demo_question` VALUES (5, '2014-01-19 15:17:57', 'q3');
On these tables I am trying to run following query:
select *
from demo_question q
left join demo_user u on q.userId = u.id
where u.age >= 20 and u.age <= 30
order by q.createdAt desc
Explanation of this query detects filesort while trying to sort results by q.createdAt column
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
| 1 | SIMPLE | q | ALL | userId | NULL | NULL | NULL | 3 | Using temporary; Using filesort |
| 1 | SIMPLE | u | ALL | PRIMARY,age | NULL | NULL | NULL | 5 | Using where; Using join buffer |
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
So my question: what can be done to prevent filesorting while running such query, because it slows down performance when there is larger number of rows in both tables?

You already have all the indexes that could possibly be used by this query. There are two problems. First, this is definitely NOT a left join, it is an inner join, and you need to understand why that is true and it should be written that way, even though the optimizer probably realizes what you are intending (in spite of expressing it differently) which would explain why changing the query doesn't change the query plan.
The second problem is that you cannot expect the optimizer to choose a plan with a tiny data set that will be the same as would be used on a larger data set.
The optimizer makes decisions on "cost," and the cost of using an index on a tiny set of data is assumed to be relatively high... so it will forego that ootion now, but not likely later... the plan you're getting here will change as the data set changes.

Related

Slow join with order query

I have a problem with the speed of query. Question is similar to this one, but can't find solution. Explain says that MySQL is using: Using where; Using index; Using temporary; Using filesort
Slow query:
select
distinct(`books`.`id`)
from `books`
join `books_genres` on `books_genres`.`book_id` = `books`.`id`
where
`books`.`is_status` = 'active' and `books`.`master_book` = 'true'
and `books_genres`.`genre_id` in(380,381,384,385,1359)
order by
`books`.`livelib_read_num` DESC, `books`.`id` DESC
limit 0,25
#25 rows (0.319 s)
But if I remove order statement from query it is really fast:
select sql_no_cache
distinct(`books`.`id`)
from `books`
join `books_genres` on `books_genres`.`book_id` = `books`.`id`
where
`books`.`is_status` = 'active' and `books`.`master_book` = 'true'
and `books_genres`.`genre_id` in(380,381,384,385,1359)
limit 0,25
#25 rows (0.005 s)
Explain:
+------+-------------+--------------+--------+---------------------------------------------------------------------------------------------------------------------+------------------+---------+--------------------------------+--------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------------+--------+---------------------------------------------------------------------------------------------------------------------+------------------+---------+--------------------------------+--------+-----------------------------------------------------------+
| 1 | SIMPLE | books_genres | range | book_id,categorie_id,book_id2,genre_id_book_id | genre_id_book_id | 10 | NULL | 194890 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | books | eq_ref | PRIMARY,is_status,master_book,is_status_master_book,is_status_master_book_indexed,is_status_donor_no_ru_master_book | PRIMARY | 4 | knigogid3.books_genres.book_id | 1 | Using where |
+------+-------------+--------------+--------+---------------------------------------------------------------------------------------------------------------------+------------------+---------+--------------------------------+--------+-----------------------------------------------------------+
2 rows in set (0.00 sec)
My tables:
CREATE TABLE `books_genres` (
`book_id` int(11) DEFAULT NULL,
`genre_id` int(11) DEFAULT NULL,
`sort` tinyint(4) DEFAULT NULL,
UNIQUE KEY `book_id` (`book_id`,`genre_id`),
KEY `categorie_id` (`genre_id`),
KEY `sort` (`sort`),
KEY `book_id2` (`book_id`),
KEY `genre_id_book_id` (`genre_id`,`book_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `books` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`is_status` enum('active','parser','incorrect','extremist','delete','fulldeteled') NOT NULL DEFAULT 'active',
`livelib_book_id` int(11) DEFAULT NULL,
`master_book` enum('true','false') DEFAULT 'true'
PRIMARY KEY (`id`),
KEY `is_status` (`is_status`),
KEY `master_book` (`master_book`),
KEY `livelib_book_id` (`livelib_book_id`),
KEY `livelib_read_num` (`livelib_read_num`),
KEY `is_status_master_book` (`is_status`,`master_book`),
KEY `livelib_book_id_master_book` (`livelib_book_id`,`master_book`),
KEY `is_status_master_book_indexed` (`is_status`,`master_book`,`indexed`),
KEY `is_status_donor_no_ru_master_book` (`is_status`,`donor`,`no_ru`,`master_book`),
KEY `livelib_url_master_book_is_status` (`livelib_url`,`master_book`,`is_status`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Problems with books_genres.
It has no PRIMARY KEY.
All columns are nullable. Will you ever insert a row with any NULLs?
Recommend (after saying NOT NULL on all columns):
PRIMARY KEY(`book_id`,`genre_id`)
INDEX(genre_id, book_id, sort)
and remove all the rest.
I don't see livelib_read_num in the table???
In the other table, remove any indexes that are the exact prefix of some other index.
These might help with speed. (Again, filter out prefix indexes that are redundant.) (These are "covering" indexes, which helps a little.)
books: INDEX(is_status, master_book, livelib_read_num, id)
books: INDEX(livelib_read_num, id, is_status, master_book)
The second index may cause the Optimizer to give preference to ORDER BY. (That is a risky optimization, since it might have to scan the entire index without finding 25 relevant rows.)
SELECT sql_no_cache
`books`.`id`
FROM
`books`
use index(books_idx_is_stat_master_livelib_id)
WHERE
(
1 = 1
AND `books`.`is_status` = 'active'
AND `books`.`master_book` = 'true'
)
AND (
EXISTS (
SELECT
1
FROM
`books_genres`
WHERE
(
`books_genres`.`book_id` = `books`.`id`
)
AND (
`books_genres`.`genre_id` IN (
380, 381, 384, 385, 1359
)
)
)
)
ORDER BY
`books`.`livelib_read_num` DESC,
`books`.`id` DESC LIMIT 0,
25;
25 rows in set (0.07 sec)

Need help optimizing sql JOIN query and indexes on large tables

I have a query with a JOIN on three tables that is taking a very long time to run. I created an index on one of my tables for the foreign key (user_shared_url_id) and two columns (event_result, enabled) in the WHERE clause, so it's an index of three columns total. There seems to be no different from when I simply use an index of the foreign key (user_shared_url_id). The other two tables are using single column indexes. My users table has about 20,000 rows, but the other two tables are quite large, with ~20 million rows. I can't get a query that takes less than a minute or so to finish. Can anyone think of any potential optimizations I can make to speed this up? Are there other indexes or improvements to my custom index that I can work with?
The tables:
CREATE TABLE `users` (
`user_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`roles` varchar(500) DEFAULT NULL,
`first_name` varchar(200) DEFAULT NULL,
`last_name` varchar(100) DEFAULT NULL,
`org_id` int(11) unsigned NOT NULL,
`user_email` varchar(100) NOT NULL,
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`user_id`),
KEY `org_id` (`org_id`),
KEY `status` (`status`),
KEY `org_id_user_id` (`org_id`,`user_id`)
) ENGINE=MyISAM AUTO_INCREMENT=162524 DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC
CREATE TABLE `user_shared_urls` (
`user_id` int(11) unsigned NOT NULL,
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`user_shared_url_id` int(11) NOT NULL AUTO_INCREMENT,
`target_url` text,
PRIMARY KEY (`user_shared_url_id`),
KEY `user_id` (`user_id`),
KEY `user_id_usu_id` (`user_id`,`user_shared_url_id`)
) ENGINE=InnoDB AUTO_INCREMENT=62449105 DEFAULT CHARSET=utf8 |
CREATE TABLE `user_share_events` (
`user_share_event_id` int(11) NOT NULL AUTO_INCREMENT,
`event_result` tinyint(1) unsigned DEFAULT NULL,
`user_shared_url_id` int(11) NOT NULL,
`enabled` tinyint(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`user_share_event_id`),
KEY `user_shared_url_id` (`user_shared_url_id`),
KEY `usuid_enabled_result` (`user_shared_url_id`,`enabled`,`event_result`)
) ENGINE=InnoDB AUTO_INCREMENT=35067339 DEFAULT CHARSET=utf8 |
My indexes:
CREATE INDEX org_id_user_id ON users(org_id, user_id);
CREATE INDEX user_id_usu_id ON user_shared_urls(user_id, user_shared_url_id);
CREATE INDEX usuid_enabled_result ON user_share_events(user_shared_url_id,enabled,event_result);
My query:
SELECT
users.user_id,
users.user_email "user_email",
users.roles "role",
CONCAT(users.first_name, ' ', users.last_name) "name",
usus.target_url
FROM
users
JOIN user_shared_urls usus ON usus.user_id = users.user_id
JOIN user_share_events uses ON usus.user_shared_url_id = uses.user_shared_url_id
WHERE
users.org_id = 1523
AND
uses.enabled = '1'
AND
uses.event_result = 1
Explain output of the above query:
+----+-------------+-------+------+----------------------------------------------------------------------------------+--------------------+---------+--------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------------------------------------------------------------------+--------------------+---------+--------------------------------+------+-------------+
| 1 | SIMPLE | users | ref | PRIMARY,org_id,org_id_user_id | org_id | 4 | const | 1235 | NULL |
| 1 | SIMPLE | usus | ref | PRIMARY,user_id,user_id_usu_id | user_id_usu_id | 4 | luster.users.user_id | 213 | NULL |
| 1 | SIMPLE | uses | ref | user_shared_url_id,user_and_service,result_service_occurred,usuid_enabled_result | user_shared_url_id | 4 | luster.usus.user_shared_url_id | 1 | Using where |
+----+-------------+-------+------+----------------------------------------------------------------------------------+--------------------+---------+--------------------------------+------+-------------+
3 rows in set (0.00 sec)
(Please use SHOW CREATE TABLE; it is more descriptive than DESCRIBE.)
Change that index you added to
INDEX(user_shared_url_id, -- = and used for the JOIN
enabled, -- =
event_result) -- Last (not an = test)
The order of columns in an INDEX is important. Start with the columns that are tested for = (or IS NULL).
Then remove the FORCE INDEX and run the EXPLAIN again.
Are these tables in a 1:many relationship? Tell us which way.
Another comment: If event_result really has only two values (true/false) and you are using NULL for false, then change the query from
uses.event_result IS NOT NULL
to
uses.event_result = 1
The point is that the Optimizer likes to optimize =, but sees NOT NULL as being any of 256 possible values; very far from =. With this query change, your index should work. And even be picked without using FORCE.
For this query:
SELECT u.user_id, u.user_email, u.roles "role",
CONCAT(u.first_name, ' ', u.last_name) "name",
usu.target_url
FROM user_shared_urls usu JOIN
users u
ON usu.user_id = u.user_id JOIN
user_share_events usev
ON usus.user_shared_url_id = usev.user_shared_url_id
WHERE u.org_id = 1010 AND
usev.event_result IS NOT NULL AND
usev.enabled = 1;
Probably the best indexes are:
users(org_id, user_id)
user_shared_urls(user_id, user_shared_url_id)
user_share_events(user_shared_url_id, enabled, event_result)
This assumes that the filtering on org_id is more selective than the other filters.

MySql group by optimization - avoid tmp table and/or filesort

I have a slow query, without the group by is fast (0.1-0.3 seconds), but with the (required) group by the duration is around 10-15s.
The query joins two tables, events (near 50 million rows) and events_locations (5 million rows).
Query:
SELECT `e`.`id` AS `event_id`,`e`.`time_stamp` AS `time_stamp`,`el`.`latitude` AS `latitude`,`el`.`longitude` AS `longitude`,
`el`.`time_span` AS `extra`,`e`.`entity_id` AS `asset_name`, `el`.`other_id` AS `geozone_id`,
`el`.`group_alias` AS `group_alias`,`e`.`event_type_id` AS `event_type_id`,
`e`.`entity_type_id`AS `entity_type_id`, el.some_id
FROM events e
INNER JOIN events_locations el ON el.event_id = e.id
WHERE 1=1
AND el.other_id = '1'
AND time_stamp >= '2018-01-01'
AND time_stamp <= '2019-06-02'
GROUP BY `e`.`event_type_id` , `el`.`some_id` , `el`.`group_alias`;
Table events:
CREATE TABLE `events` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`event_type_id` int(11) NOT NULL,
`entity_type_id` int(11) NOT NULL,
`entity_id` varchar(64) NOT NULL,
`alias` varchar(64) NOT NULL,
`time_stamp` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `entity_id` (`entity_id`),
KEY `event_type_idx` (`event_type_id`),
KEY `idx_events_time_stamp` (`time_stamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Table events_locations
CREATE TABLE `events_locations` (
`event_id` bigint(20) NOT NULL,
`latitude` double NOT NULL,
`longitude` double NOT NULL,
`some_id` bigint(20) DEFAULT NULL,
`other_id` bigint(20) DEFAULT NULL,
`time_span` bigint(20) DEFAULT NULL,
`group_alias` varchar(64) NOT NULL,
KEY `some_id_idx` (`some_id`),
KEY `idx_events_group_alias` (`group_alias`),
KEY `idx_event_id` (`event_id`),
CONSTRAINT `fk_event_id` FOREIGN KEY (`event_id`) REFERENCES `events` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The explain:
+----+-------------+-------+--------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
| 1 | SIMPLE | ea | ALL | 'idx_event_id' | NULL | NULL | NULL | 5152834 | 'Using where; Using temporary; Using filesort' |
| 1 | SIMPLE | e | eq_ref | 'PRIMARY,idx_events_time_stamp' | PRIMARY | '8' | 'name.ea.event_id' | 1 | |
+----+-------------+----------------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
2 rows in set (0.08 sec)
From the doc:
Temporary tables can be created under conditions such as these:
If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created.
DISTINCT combined with ORDER BY may require a temporary table.
If you use the SQL_SMALL_RESULT option, MySQL uses an in-memory temporary table, unless the query also contains elements (described later) that require on-disk storage.
I already tried:
Create an index by 'el.some_id , el.group_alias'
Decrease the varchar size to 20
Increase the size of sort_buffer_size and read_rnd_buffer_size;
Any suggestions for performance tuning would be much appreciated!
In your case events table has time_span as indexing property. So before joining both tables first select required records from events table for specific date range with required details. Then join the event_location by using table relation properties.
Check your MySql Explain keyword to check how does your approach your table records. It will tell you how much rows are scanned for before selecting required records.
Number of rows that are scanned also involve in query execution time. Use my below logic to reduce the number of rows that are scanned.
SELECT
`e`.`id` AS `event_id`,
`e`.`time_stamp` AS `time_stamp`,
`el`.`latitude` AS `latitude`,
`el`.`longitude` AS `longitude`,
`el`.`time_span` AS `extra`,
`e`.`entity_id` AS `asset_name`,
`el`.`other_id` AS `geozone_id`,
`el`.`group_alias` AS `group_alias`,
`e`.`event_type_id` AS `event_type_id`,
`e`.`entity_type_id` AS `entity_type_id`,
`el`.`some_id` as `some_id`
FROM
(select
`id` AS `event_id`,
`time_stamp` AS `time_stamp`,
`entity_id` AS `asset_name`,
`event_type_id` AS `event_type_id`,
`entity_type_id` AS `entity_type_id`
from
`events`
WHERE
time_stamp >= '2018-01-01'
AND time_stamp <= '2019-06-02'
) AS `e`
JOIN `events_locations` `el` ON `e`.`event_id` = `el`.`event_id`
WHERE
`el`.`other_id` = '1'
GROUP BY
`e`.`event_type_id` ,
`el`.`some_id` ,
`el`.`group_alias`;
The relationship between these tables is 1:1, so, I asked me why is a group by required and I found some duplicated rows, 200 in 50000 rows. So, somehow, my system is inserting duplicates and someone put that group by (years ago) instead of seek of the bug.
So, I will mark this as solved, more or less...

How to optimize join when search in categories

I have a table with items:
CREATE TABLE `ost_content` (
`uid` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`type` enum('media','serial','season','series') NOT NULL,
`alias` varchar(200) NOT NULL,
`views` mediumint(7) NOT NULL DEFAULT '0',
`ratings_count` enum('0','1','2','4','5') NOT NULL DEFAULT '0',
`ratings_sum` mediumint(5) NOT NULL DEFAULT '0',
`upload_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`conversion_status` enum('converting','error','success','announcement') NOT NULL DEFAULT 'converting',
PRIMARY KEY (`uid`),
UNIQUE KEY `idx_uid_type` (`uid`,`type`),
KEY `idx_type` (`type`),
KEY `idx_upload_date DESC` (`upload_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
And table, that connect items with categories:
CREATE TABLE `ost_categories2media` (
`categories2media_id` mediumint(6) unsigned NOT NULL AUTO_INCREMENT,
`categories2media_category_id` smallint(5) unsigned NOT NULL,
`categories2media_uid` mediumint(8) unsigned NOT NULL,
PRIMARY KEY (`categories2media_id`),
KEY `categories2media_media_id` (`categories2media_uid`),
KEY `categories2media_category_id` (`categories2media_category_id`)
) ENGINE=InnoDB AUTO_INCREMENT=501114 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Than, I executing query:
SELECT
c1.uid,
c1.alias,
c1.type,
c1.views,
c1.upload_date,
c1.ratings_sum,
c1.ratings_count,
c1.conversion_status
FROM
ost_content c1
LEFT JOIN ost_categories2media c2m ON c2m.categories2media_uid = c1.uid
WHERE
c2m.categories2media_category_id = '53'
AND c1.conversion_status IN ('success', 'announcement')
AND c1.type IN ('serial', 'media')
ORDER BY
c1.upload_date DESC
LIMIT 16, 16
It executing slow, categories2media_category_id check many rows:
+----+-------------+-------+--------+--------------------------------------------------------+------------------------------+---------+---------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------------------------------------------+------------------------------+---------+---------------------------------+-------+----------------------------------------------+
| 1 | SIMPLE | c2m | ref | categories2media_media_id,categories2media_category_id | categories2media_category_id | 2 | const | 32076 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | c1 | eq_ref | PRIMARY,idx_uid_type,idx_type | PRIMARY | 3 | uakino.c2m.categories2media_uid | 1 | Using where |
+----+-------------+-------+--------+--------------------------------------------------------+------------------------------+---------+---------------------------------+-------+----------------------------------------------+
How I can optimize or rewrite this query?
Mysql indexes are like cooks, too many of them aren't very useful because mysql uses only one index per table. Let's look at ost_categories2media,
That's three separate indexes on three columns. You are better off with two indexes like this.
PRIMARY KEY (`categories2media_id`),
KEY `categories2media_media_id` (`categories2media_uid`,`categories2media_category_id`)
Now mysql no longer has to decide between an index on categories2media_uid or categories2media_category_id it has an index that covers both!
Looking at your ost_content table we see
PRIMARY KEY (`uid`),
UNIQUE KEY `idx_uid_type` (`uid`,`type`),
KEY `idx_type` (`type`),
KEY `idx_upload_date DESC` (`upload_date`)
Some of these indexes are a bit redundant. Any query that filters on the uid field can use the PK while any query that filters on type can use idx_type that means idx_uid_type is there just to enforce the uniqueness. But we can make it more usefull like this:
PRIMARY KEY (`uid`),
UNIQUE KEY `idx_uid_type` (`type`,`uid`),
KEY `idx_upload_date DESC` (`upload_date`)
We've got rid of one index! that ought to make your indexes a lot faster. You still have an index on upload_date that isn't used in this particulary query. So how about a composite index for that?
PRIMARY KEY (`uid`),
UNIQUE KEY `idx_uid_type` (`type`,`uid`),
KEY `idx_upload_date DESC` (`uid`,`upload_date`)
First, the LEFT JOIN is not necessary. So, you can write the query as:
SELECT c.*
FROM ost_content c JOIN
ost_categories2media c2m
ON c2m.categories2media_uid = c.uid
WHERE c2m.categories2media_category_id = '53' AND
c.conversion_status IN ('success', 'announcement') AND
c.type IN ('serial', 'media')
ORDER BY c.upload_date DESC
LIMIT 16, 16;
Unfortunately, your conditions on the content table are not simple = conditions. If they were, and index on ost_content(conversion_status, type, uid) would be recommended. This might still be the better option.
Another option is to go the other way: An index on ost_categories2media(categories2media_category_id, categories2media_uid).
You might find that the first composite index and this query work best:
SELECT c.*
FROM ((SELECT c.*
FROM ost_content c JOIN
ost_categories2media c2m
ON c2m.categories2media_uid = c.uid
WHERE c2m.categories2media_category_id = '53' AND
c.conversion_status = 'success' AND
c.type IN ('serial', 'media')
) UNION ALL
(SELECT c.*
FROM ost_content c JOIN
ost_categories2media c2m
ON c2m.categories2media_uid = c.uid
WHERE c2m.categories2media_category_id = '53' AND
c.conversion_status = 'announcement' AND
c.type IN ('serial', 'media')
)
) c
ORDER BY c.upload_date DESC
LIMIT 16, 16;
This looks more complicated, but each subquery can take advantage of the index, so it might have improved performance.

Slow MySql query with order by limit with index

I have a query generated by Entity Framework, that looks like this:
SELECT
`Extent1`.`Id`,
`Extent1`.`Name`,
`Extent1`.`ExpireAfterUTC`,
`Extent1`.`FileId`,
`Extent1`.`FileHash`,
`Extent1`.`PasswordHash`,
`Extent1`.`Size`,
`Extent1`.`TimeStamp`,
`Extent1`.`TimeStampOffset`
FROM `files` AS `Extent1` INNER JOIN `containers` AS `Extent2` ON `Extent1`.`ContainerId` = `Extent2`.`Id`
ORDER BY
`Extent1`.`Id` ASC LIMIT 0,10
It runs painfully slow.
I have indexes on files.Id (PK), files.ContainerId(FK), containers.Id(PK) and I don't understand why mysql seems to be doing a full sort before returning the required records, even though there already is an index on the Id column.
Further more, this data is displayed in a grid which supports filters, sorts and pagination and a good use of the indexes is highly required.
Here are the table definitions:
CREATE TABLE `files` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`FileId` varchar(100) NOT NULL,
`ContainerId` int(11) NOT NULL,
`ContainerGuid` binary(16) NOT NULL,
`Guid` binary(16) NOT NULL,
`Name` varchar(1000) NOT NULL,
`ExpireAfterUTC` datetime DEFAULT NULL,
`PasswordHash` binary(32) DEFAULT NULL,
`FileHash` tinyblob NOT NULL,
`Size` bigint(20) NOT NULL,
`TimeStamp` double NOT NULL,
`TimeStampOffset` double NOT NULL,
`FilePostId` int(11) NOT NULL,
`FilePostGuid` binary(16) NOT NULL,
`AttributeId` int(11) NOT NULL,
PRIMARY KEY (`Id`),
UNIQUE KEY `FileId_UNIQUE` (`FileId`),
KEY `Files_ContainerId_FK` (`ContainerId`),
KEY `Files_AttributeId_FK` (`AttributeId`),
KEY `Files_FileId_index` (`FileId`),
KEY `Files_FilePostId_index` (`FilePostId`),
KEY `Files_Guid_index` (`Guid`),
CONSTRAINT `Files_AttributeId_FK` FOREIGN KEY (`AttributeId`) REFERENCES `attributes` (`Id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `Files_ContainerId_FK` FOREIGN KEY (`ContainerId`) REFERENCES `containers` (`Id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `Files_FilePostsId_FK` FOREIGN KEY (`FilePostId`) REFERENCES `fileposts` (`Id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=977942 DEFAULT CHARSET=utf8;
CREATE TABLE `containers` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`Name` varchar(255) NOT NULL,
`Guid` binary(16) NOT NULL,
`AesKey` binary(32) NOT NULL,
`FileCount` int(10) unsigned NOT NULL DEFAULT '0',
`Size` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`Id`),
KEY `Containers_Guid_index` (`Guid`),
KEY `Containers_Name_index` (`Name`)
) ENGINE=InnoDB AUTO_INCREMENT=76 DEFAULT CHARSET=utf8;
You will notice there are some other relationships in the files table, which I have left out just to simplify the query without affecting the observed behavior.
Here is also an output from EXPLAIN EXTENDED:
+----+-------------+---------+-------+----------------------+-----------------------+---------+----------------------------------+-------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+-------+----------------------+-----------------------+---------+----------------------------------+-------+----------+----------------------------------------------+
| 1 | SIMPLE | Extent2 | index | PRIMARY | Containers_Guid_index | 16 | NULL | 9 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | Extent1 | ref | Files_ContainerId_FK | Files_ContainerId_FK | 4 | netachmentgeneraltest.Extent2.Id | 73850 | 100.00 | |
+----+-------------+---------+-------+----------------------+-----------------------+---------+----------------------------------+-------+----------+----------------------------------------------+
Files table has ~900000 records (and counting) and containers has 9.
This issue only occurs when ORDER BY is present.
Also, I can't do much in terms of modifying the query because it is generated by Entity Framework. I did as much as I could with the LINQ query in order to simplify it (at first it had some horrible sub queries which executed even slower).
Query hints (as in force index) are not a solution here either, because EF does not support such features.
I am mostly hoping to find some database level optimizations to do.
For those who didn't spot the tags, the database in question is MySql.
MySQL only uses one index per table. Right now, it's preferring to use the foreign key index so the join is efficient, but that means that the sort is not using an index.
Try creating a compound index on ContainerId, filedID
This is essentially your query:
SELECT e1.*
FROM `files` e1 INNER JOIN
`containers` e2
ON e1.`ContainerId` = e2.`Id`
ORDER BY e1.`Id` ASC
LIMIT 0, 10;
You can try an index on files(id, ContainerId). This might inspire MySQL to use the composite index, focused on the order by.
It would probably be more likely if the query were phrased as:
SELECT e1.*
FROM `files` e1
WHERE EXISTS (SELECT 1 FROM containers e2 WHERE e1.`ContainerId` = e2.`Id`)
ORDER BY e1.`Id` ASC
LIMIT 0, 10;
There is one way that does work to use the indexes. However, it depends on something in MySQL that is not documented to work (although it does in practice). The following will read the data in order, but it incurs the overhead of materializing the subquery -- but not for a sort:
SELECT e1.*
FROM (SELECT e1.*
FROM files e1
ORDER BY e1.id ASC
) e1
WHERE EXISTS (SELECT 1 FROM containers e2 WHERE e1.`ContainerId` = e2.`Id`)
LIMIT 0, 10;