Why is subquery join much faster than direct join - mysql

I have 2 tables (pages and comments), around 130 000 rows each.
I want to list pages without any comments (foreign key is comments.page_id)
If I execute the normal left outer join, it takes an amazing more than 750 seconds to run. (130k^2 = 17B). Whereas if I execute the same join, but using subqueries for the tables, it takes just 1 second.
Server version: 5.6.44-log - MySQL Community Server (GPL):
Query 1. Normal join, 750+ seconds
SELECT p.id
FROM `pages` AS p
LEFT JOIN `comments` AS c
ON p.id = c.page_id
WHERE c.page_id IS NULL
GROUP BY 1
Query 2. Join with first table as subquery, Too much time
SELECT p.id
FROM (
SELECT id FROM `pages`
) AS p
LEFT JOIN `comments` AS c
ON p.id = c.page_id
WHERE c.page_id IS NULL
GROUP BY 1
Query 3. Join with second table as subquery, 1.6 seconds
SELECT p.id
FROM `pages` AS p
LEFT JOIN (
SELECT * FROM `comments`
) AS c
ON p.id = c.page_id
WHERE c.page_id IS NULL
GROUP BY 1
Query 4. Join with 2 subqueries, 1 second
SELECT p.id
FROM (
SELECT id FROM `pages`
) AS p
LEFT JOIN (
SELECT * FROM `comments`
) AS c
ON p.id = c.page_id
WHERE c.page_id IS NULL
GROUP BY 1
Query 5. Join with 2 subqueries, selecting only 1 column, 0.2 seconds
SELECT p.id
FROM (
SELECT id FROM `pages`
) AS p
LEFT JOIN (
SELECT page_id FROM `comments`
) AS c
ON p.id = c.page_id
WHERE c.page_id IS NULL
GROUP BY 1
Query 6. Too much time
SELECT p.id
FROM `pages` AS p
WHERE NOT EXISTS( SELECT page_id FROM `comments`
WHERE page_id = p.id );;
Now, in MySql version 5.7, all of the above queries take "too much time" to execute.
In MySql 5.7, query 1 and 4 have same explanation:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 SIMPLE p NULL index PRIMARY PRIMARY 4 NULL 147626 100.00 Using index; Using temporary; Using filesort
1 SIMPLE c NULL ALL NULL NULL NULL NULL 147790 10.00 Using where; Not exists; Using join buffer (Block Nested Loop)
In MySql 5.6, unfortunately I cannot get the explanation for query 1 right now (taking too much time), but for query 4 is the below:
id select_type table type possible_keys key key_len ref rows Extra
---------------------------------------------------------------------------------------------------------------------------
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 147626 Using temporary; Using filesort
1 PRIMARY <derived3> ref <auto_key0> <auto_key0> 4 p.id 10 Using where; Not exists
3 DERIVED comments ALL NULL NULL NULL NULL 147790 NULL
2 DERIVED pages index NULL PRIMARY 4 NULL 147626 Using index
Tables:
CREATE TABLE `pages` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`identifier` varchar(250) NOT NULL DEFAULT '',
`reference` varchar(250) NOT NULL DEFAULT '',
`url` varchar(1000) NOT NULL DEFAULT '',
`moderate` varchar(250) NOT NULL DEFAULT 'default',
`is_form_enabled` tinyint(1) unsigned NOT NULL DEFAULT '1',
`date_modified` datetime NOT NULL,
`date_added` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=147627 DEFAULT CHARSET=utf8
CREATE TABLE `comments` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(10) unsigned NOT NULL DEFAULT '0',
`page_id` int(10) unsigned NOT NULL DEFAULT '0',
`website` varchar(250) NOT NULL DEFAULT '',
`town` varchar(250) NOT NULL DEFAULT '',
`state_id` int(10) NOT NULL DEFAULT '0',
`country_id` int(10) NOT NULL DEFAULT '0',
`rating` tinyint(1) unsigned NOT NULL DEFAULT '0',
`reply_to` int(10) unsigned NOT NULL DEFAULT '0',
`comment` text NOT NULL,
`reply` text NOT NULL,
`ip_address` varchar(250) NOT NULL DEFAULT '',
`is_approved` tinyint(1) unsigned NOT NULL DEFAULT '1',
`notes` text NOT NULL,
`is_admin` tinyint(1) unsigned NOT NULL DEFAULT '0',
`is_sent` tinyint(1) unsigned NOT NULL DEFAULT '0',
`sent_to` int(10) unsigned NOT NULL DEFAULT '0',
`likes` int(10) unsigned NOT NULL DEFAULT '0',
`dislikes` int(10) unsigned NOT NULL DEFAULT '0',
`reports` int(10) unsigned NOT NULL DEFAULT '0',
`is_sticky` tinyint(1) unsigned NOT NULL DEFAULT '0',
`is_locked` tinyint(1) unsigned NOT NULL DEFAULT '0',
`is_verified` tinyint(1) unsigned NOT NULL DEFAULT '0',
`date_modified` datetime NOT NULL,
`date_added` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=147879 DEFAULT CHARSET=utf8
Questions
Why is this happening? What does MySql do under the hood?
Does this happen only in MySql, or any other Sql as well?
How can I write a fast query to get what I need? (In both v 5.6, 5.7)

The problem with your long-running queries, is that you lack an index on the page_id column of the comments table. Hence, for each row from the pages table, you need to check all rows of the comments table. Since you are using LEFT JOIN, this is the only possible join order. What happens in 5.6, is that when you use a subquery in the FROM clause (aka derived table), MySQL will create an index on the temporary table used for the result of the derived table (auto_key0 in the EXPLAIN output). The reason it is faster when you only select one column, is that the temporary table will be smaller.
In MySQL 5.7, such derived tables will be automatically merge into the main query, if possible. This is done to avoid the extra temporary tables. However, this means that you no longer have an index to use for the join. (See this blog post for details.)
You have two options to improve the query time in 5.7:
You can create an index on comments(page_id)
You can prevent the subquery from being merged by rewriting it to a query that can not be merged. Subqueries with aggregation, LIMIT, or UNION will not be merged (see the blog post for details). One way to do this is to add a LIMIT clause to the subquery. In order not to remove any rows from the result, the limit must be larger than the number of rows in the table.
In MySQL 8.0, you can also use an optimizer hint to avoid the merging. In your case, that would be something like
SELECT /*+ NO_MERGE(c) */ ... FROM
See slides 34-37 of this presentation for examples of how to use such hints.

Query 1 has the "explode-implode" syndrome. First it does a JOIN; this explodes the number of rows. Then it does a GROUP BY to shrink back.
Also
The number of comments per page, etc, will have an effect on your query.
SELECT * fetches all the columns, when it needs only to know if the LEFT JOIN succeeded. (You observed that.) Furthermore, you keep none of the columns since you are looking for missing rows.
Query 2 should not be as fast as what you found -- It needs to build two temp tables (the "derived" tables), index one of them, then perform the outer query. (Possibly a new enough version of MySQL can short-circuit some of that effort; old version were notorious at doing an inefficient job.)
Query 3:
Try
SELECT p.id
FROM `pages` AS p
WHERE NOT EXISTS( SELECT 1 FROM `comments`
WHERE page_id = p.id );
ALSO:
Use InnoDB, not MyISAM.
comments needs INDEX(page_id)

Related

Adding a LIMIT 1 to a simple MySQL Query causes it to hang

I have a strange issue where by adding LIMIT 1 to a query, it causes the query to never return. And yet, when I remove that LIMIT 1 the query works instantly. LIMIT 1 should typically speed things up, not slow them down, so I'm wondering what I'm missing here.
The query that works fine is this:
Hangs For Infinite
SELECT
`groups`.*
FROM
`groups`
INNER JOIN
`positions`
ON
`groups`.`uid` = `positions`.`group_uid`
WHERE
`positions`.`component_uid` = '1234'
AND
(`groups`.deleted_at IS NULL)
AND
(`positions`.deleted_at IS NULL)
LIMIT
1
Associated EXPLAIN Result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE groups ALL index_groups_on_deleted_at NULL NULL NULL 6832 Using where
1 SIMPLE positions ref index_positions_on_deleted_at index_positions_on_deleted_at 6 const 22110 Using index condition; Using where
Works Just Fine
SELECT
`groups`.*
FROM
`groups`
INNER JOIN
`positions`
ON
`groups`.`uid` = `positions`.`group_uid`
WHERE
`positions`.`component_uid` = '1234'
AND
(`groups`.deleted_at IS NULL)
AND
(`positions`.deleted_at IS NULL)
/* LIMIT
1 */
Associated EXPLAIN Result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE groups ALL index_groups_on_deleted_at NULL NULL NULL 6832 Using where
1 SIMPLE positions ALL index_positions_on_deleted_at NULL NULL NULL 44220 Using where; Using join buffer (Block Nested Loop)
CREATE TABLE Commands:
CREATE TABLE `groups` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`uid` varchar(255) NOT NULL,
`deleted_at` datetime DEFAULT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `index_groups_on_deleted_at` (`deleted_at`)
) ENGINE=InnoDB AUTO_INCREMENT=6941 DEFAULT CHARSET=utf8;
CREATE TABLE `positions` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`uid` varchar(255) NOT NULL,
`component_uid` varchar(255) NOT NULL,
`deleted_at` datetime DEFAULT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
`routing_group_uid` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_positions_on_deleted_at` (`deleted_at`)
) ENGINE=InnoDB AUTO_INCREMENT=44321 DEFAULT CHARSET=utf8;
But there's no errors produced so I can't figure out what could be going wrong.
Any ideas on what the issue could be would be great.
Thanks.
Update
From comments that:
Mysql documentation mentions the use of IS NULL for optimization that it can handles only one.
This still hangs:
SELECT
`groups`.*
FROM
`groups`
INNER JOIN
`positions`
ON
`groups`.`uid` = `positions`.`group_uid`
WHERE
`positions`.`component_uid` = '1234'
/* AND
(`groups`.deleted_at IS NULL) */
AND
(`positions`.deleted_at IS NULL)
LIMIT
1
But this works fine:
SELECT
`groups`.*
FROM
`groups`
INNER JOIN
`positions`
ON
`groups`.`uid` = `positions`.`group_uid`
WHERE
`positions`.`component_uid` = '1234'
AND
(`groups`.deleted_at IS NULL)
/* AND
(`positions`.deleted_at IS NULL) */
LIMIT
1

Strange index behavior mysql

I usually pride myself to be a database pro but I can't really wrap my head around this behavior. I hope someone can explain how this is working.
I have two mysql tables orders:
CREATE TABLE `orders` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`status` tinyint(4) NOT NULL,
`total` decimal(7,2) NOT NULL,
`date_created` datetime NOT NULL,
`date_updated` datetime NOT NULL,
`voucher_code` varchar(127) DEFAULT NULL,
`voucher_id` int(11) unsigned DEFAULT NULL,
`user_id` int(11) unsigned DEFAULT NULL,
`billing_address_id` int(11) unsigned NOT NULL,
`shipping_address_id` int(11) unsigned NOT NULL,
`reference_id` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `reference_id` (`reference_id`),
KEY `address_id` (`billing_address_id`)
) ENGINE=InnoDB AUTO_INCREMENT=168067 DEFAULT CHARSET=latin1;
and addresses:
CREATE TABLE `addresses` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`title` tinyint(4) DEFAULT NULL,
`first_name` varchar(255) NOT NULL,
`last_name` varchar(255) NOT NULL,
`street` varchar(255) NOT NULL,
`street2` varchar(255) DEFAULT NULL,
`company_name` varchar(255) DEFAULT NULL,
`city` varchar(45) NOT NULL,
`postcode` varchar(45) DEFAULT NULL,
`region` varchar(45) DEFAULT NULL,
`country` varchar(45) NOT NULL,
`phone` varchar(45) DEFAULT NULL,
`user_id` int(11) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `fk_addresses_users1_idx` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=95277 DEFAULT CHARSET=latin1;
Now as you can see I have created an index inside the orders table for the billing_address_id called address_id that should match with the address id.
This is the query I am trying to run:
SELECT
o.id, a.first_name, a.last_name, o.total, o.date_created
FROM
orders o USE INDEX FOR JOIN (PRIMARY) JOIN
addresses a ON a.id = o.billing_address_id
ORDER BY id DESC
LIMIT 0, 50
If I run the query without any index specification it will pickup and use the address_id index which I would expect be the fastest way to match the two tables.
Strangely enough with the 'address_id' index the query runs in 2 seconds.
If i use the normal 'PRIMARY' index which works on the order id it takes 0.000 seconds.
This is bugging me out. I thought I was supposed to create indexes to expedite the joining process between tables.
If I run EXPLAIN on the two queries I get:
EXPLAIN EXTENDED
SELECT o.id, a.first_name, a.last_name, o.total, o.date_created
FROM orders o
JOIN addresses a ON a.id = o.billing_address_id
ORDER BY id DESC
LIMIT 0, 50
1 SIMPLE a ALL PRIMARY 95234 100.00 Using temporary; Using filesort
1 SIMPLE o ref address_id address_id 4 my_basket.a.id 1 100.00
With the index:
EXPLAIN EXTENDED
SELECT o.id, a.first_name, a.last_name, o.total, o.date_created
FROM orders o USE INDEX FOR
JOIN (PRIMARY)
JOIN addresses a ON a.id = o.billing_address_id
ORDER BY id DESC
LIMIT 0, 50
1 SIMPLE o index PRIMARY 4 50 332632.00
1 SIMPLE a eq_ref PRIMARY PRIMARY 4 my_basket.o.billing_address_id 1 100.00
Thank you for finding the time to answer this question.
For ORDER BY ... LIMIT queries it will often be beneficial to use a query execution plan that avoids sorting. This is not necessarily because the sorting is expensive, but because it makes it possible to stop the query execution once the number of requested rows (here 50) are found.
In your case, if one starts with table a, the full join result will have to be generated before selecting the "top" 50 rows. If you start with scanning table o using the PRIMARY index, the join result will be sorted on o.id, and the join execution can stop once 50 rows have been found.
The cost model used to select between the two approaches has been improved since MySQL 5.6. I suggest you try out MySQL 5.7 to see if the MySQL optimizer is now able to select the most optimal plan.
I'm surprised that the two queries even compile -- ORDER BY id is ambiguous since each table has a different id.
When doing a JOIN, always qualify all columns.
Meanwhile, remove the USE INDEX.

Slow query with joined derived tables

I have a few queries on a "custom dashboard" of my application, and one of them is taking 10-12 seconds to execute. Using EXPLAIN I can see why it's slow, but I don't know what to do about it. Here is the query:
SELECT person.PersonID,FullName,Furigana,qualdate FROM person
INNER JOIN (
SELECT pq.PersonID,MAX(ContactDate) AS qualdate FROM person pq
INNER JOIN contact cq ON pq.PersonID=cq.PersonID
WHERE cq.ContactTypeID IN (22,26,45) GROUP BY pq.PersonID
) qual ON person.PersonID=qual.PersonID
LEFT OUTER JOIN (
SELECT pe.personID,MAX(ContactDate) AS elimdate FROM person pe
INNER JOIN contact ce ON pe.PersonID=ce.PersonID WHERE ce.ContactTypeID IN (25,31,30,41,23,42,2,33,35,29,12)
GROUP BY pe.PersonID
) elim ON qual.PersonID=elim.PersonID
LEFT OUTER JOIN (
SELECT po.personID FROM person po
INNER JOIN percat pc ON po.PersonID=pc.PersonID WHERE pc.CategoryID=38
) overseas ON qual.PersonID=overseas.PersonID
WHERE (elimdate IS NULL OR qualdate > elimdate)
AND qualdate < CURDATE()-INTERVAL 7 DAY
AND overseas.PersonID IS NULL
ORDER BY qualdate
And here is the EXPLAIN result:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 5447 Using where; Using temporary; Using filesort
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 5565 Using where
1 PRIMARY <derived4> ALL NULL NULL NULL NULL 9 Using where; Not exists
1 PRIMARY person eq_ref PRIMARY PRIMARY 4 qual.PersonID 1
4 DERIVED pc ref PRIMARY,CategoryID CategoryID 4 8
4 DERIVED po eq_ref PRIMARY PRIMARY 4 kizuna_misa.pc.PersonID 1 Using index
3 DERIVED pe index PRIMARY PRIMARY 4 NULL 5964 Using index
3 DERIVED ce ref PersonID,ContactTypeID PersonID 4 kizuna_misa.pe.PersonID 1 Using where
2 DERIVED pq index PRIMARY PRIMARY 4 NULL 5964 Using index
2 DERIVED cq ref PersonID,ContactTypeID PersonID 4 kizuna_misa.pq.PersonID 1 Using where
I'm sure the first line of the EXPLAIN reveals the problem (comparing with similar queries, it appears that the second line isn't too slow), but I don't know how to fix it. I already have indexes on every column that appears in the joins, but since the tables are <derived2> etc., I guess indexes are irrelevant.
The objective (since it's probably not obvious to someone unfamiliar with my application and schema) is a followup tickler list - if one of the #22/26/45 contacts has occurred but nothing has been done in response (either one of several other contacts or designating by a category assignment that the person is overseas), then the person should appear in the list for followup after waiting a week. Subqueries are easier for me to write and understand than these messy joins, but I can't check the sequence of dates (and subqueries are often slow, also).
EDIT (in response to Rick James):
MySQL version is 5.0.95 (yeah, I know...). And here is SHOW CREATE TABLE for the three tables involved, even though most of the fields in person are irrelevant:
CREATE TABLE `contact` (
`ContactID` int(11) unsigned NOT NULL auto_increment,
`PersonID` int(11) unsigned NOT NULL default '0',
`ContactTypeID` int(11) unsigned NOT NULL default '0',
`ContactDate` date NOT NULL default '0000-00-00',
`Description` text,
PRIMARY KEY (`ContactID`),
KEY `ContactDate` (`ContactDate`),
KEY `PersonID` (`PersonID`),
KEY `ContactTypeID` (`ContactTypeID`)
) ENGINE=MyISAM AUTO_INCREMENT=16901 DEFAULT CHARSET=utf8
CREATE TABLE `percat` (
`PersonID` int(11) unsigned NOT NULL default '0',
`CategoryID` int(11) unsigned NOT NULL default '0',
PRIMARY KEY (`PersonID`,`CategoryID`),
KEY `CategoryID` (`CategoryID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
CREATE TABLE `person` (
`PersonID` int(11) unsigned NOT NULL auto_increment,
`FullName` varchar(100) NOT NULL default '',
`Furigana` varchar(100) NOT NULL default '',
`Sex` enum('','M','F') character set ascii NOT NULL default '',
`HouseholdID` int(11) unsigned NOT NULL default '0',
`Relation` varchar(6) character set ascii NOT NULL default '',
`Title` varchar(6) NOT NULL default '',
`CellPhone` varchar(30) character set ascii NOT NULL default '',
`Email` varchar(70) character set ascii NOT NULL default '',
`Birthdate` date NOT NULL default '0000-00-00',
`Country` varchar(30) NOT NULL default '',
`URL` varchar(150) NOT NULL default '',
`Organization` tinyint(1) NOT NULL default '0',
`Remarks` text NOT NULL,
`Photo` tinyint(1) NOT NULL default '0',
`UpdDate` date NOT NULL default '0000-00-00',
PRIMARY KEY (`PersonID`),
KEY `Furigana` (`Furigana`),
KEY `FullName` (`FullName`),
KEY `Email` (`Email`),
KEY `Organization` (`Organization`,`Furigana`)
) ENGINE=MyISAM AUTO_INCREMENT=6063 DEFAULT CHARSET=utf8
Attempted suggestion:
I tried to implement Rick James's suggestion of putting the subselects in the field list (I didn't even know that was possible), like this:
SELECT
p.PersonID,
FullName,
Furigana,
(SELECT MAX(ContactDate) FROM contact cq
WHERE cq.PersonID=p.PersonID
AND cq.ContactTypeID IN (22,26,45))
AS qualdate,
(SELECT MAX(ContactDate) FROM contact ce
WHERE ce.PersonID=p.PersonID
AND ce.ContactTypeID IN (25,31,30,41,23,42,2,33,35,29,12))
AS elimdate
FROM person p
WHERE (elimdate IS NULL OR qualdate > elimdate)
AND qualdate < CURDATE()-INTERVAL 7 DAY
AND NOT EXISTS (SELECT * FROM percat WHERE CategoryID=38 AND percat.PersonID=p.PersonID)
ORDER BY qualdate
But it complains: #1054 - Unknown column 'elimdate' in 'where clause' According to the docs, WHERE clauses are interpreted before field lists, so this approach isn't going to work.
You have an interesting query. I am not sure what the best solution is. Here are two guesses:
Plan A
INDEX(qualdate)
may help. Please provide SHOW CREATE TABLE.
This construct optimizes poorly:
FROM ( SELECT ... )
JOIN ( SELECT ... )
In your case, overseas should probably turned into a JOIN, not a subselect. And the other two should probably be turned into a different flavor of dependent subquery:
SELECT ...,
( SELECT MAX(...) ... ) AS qualdate,
( SELECT MAX(...) ... ) AS elimdate
FROM ...
What version of MySQL are you running?
Plan B
If practical, fold these into the subqueries so that they generate fewer rows, thereby leading to less effort at the outer query. (One per subquery)
elimdate IS NOT NULL
qualdate < CURDATE()-INTERVAL 7 DAY
overseas.PersonID IS NOT NULL
Perhaps the NULL tests apply to LEFT and this suggestion may not apply.

Avoid filesort with INNER JOIN + ORDER BY

I've been reading other posts but I didn't managed to fix my query.
Using DESC order the query is x20 times slower, I must improve that.
This is the query:
SELECT posts.post_id, posts.post_b_id, posts.post_title, posts.post_cont, posts.thumb, posts.post_user, boards.board_title_l, boards.board_title
FROM posts
INNER JOIN follow ON posts.post_b_id = follow.board_id
INNER JOIN boards ON posts.post_b_id = boards.board_id
WHERE follow.user_id =1
ORDER BY posts.post_id DESC
LIMIT 10
And these are the tables (Updated):
CREATE TABLE IF NOT EXISTS `posts` (
`post_id` int(11) NOT NULL AUTO_INCREMENT,
`post_b_id` int(11) unsigned NOT NULL,
`post_title` varchar(50) COLLATE utf8_bin NOT NULL,
`post_cont` text COLLATE utf8_bin NOT NULL,
`post_mintxt` varchar(255) COLLATE utf8_bin NOT NULL,
`post_type` char(3) COLLATE utf8_bin NOT NULL,
`thumb` varchar(200) COLLATE utf8_bin NOT NULL,
`post_user` varchar(16) COLLATE utf8_bin NOT NULL,
`published` enum('0','1') COLLATE utf8_bin NOT NULL,
`post_ip` varchar(94) COLLATE utf8_bin NOT NULL,
`post_ip_dat` int(11) unsigned NOT NULL,
`post_up` int(10) unsigned NOT NULL DEFAULT '0',
`post_down` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`post_id`),
KEY `post_b_id` (`post_b_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=405 ;
CREATE TABLE IF NOT EXISTS `boards` (
`board_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`board_title_l` varchar(19) COLLATE utf8_bin NOT NULL,
`board_user_id` int(10) unsigned NOT NULL,
`board_title` varchar(19) COLLATE utf8_bin NOT NULL,
`board_user` varchar(16) COLLATE utf8_bin NOT NULL,
`board_txt` tinyint(1) unsigned NOT NULL,
`board_img` tinyint(1) unsigned NOT NULL,
`board_vid` tinyint(1) unsigned NOT NULL,
`board_desc` varchar(100) COLLATE utf8_bin NOT NULL,
`board_mod_p` tinyint(3) unsigned NOT NULL DEFAULT '0',
`board_ip` varchar(94) COLLATE utf8_bin NOT NULL,
`board_dat_ip` int(11) unsigned NOT NULL,
PRIMARY KEY (`board_id`),
UNIQUE KEY `board_title_l` (`board_title_l`),
KEY `board_user_id` (`board_user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=89 ;
CREATE TABLE IF NOT EXISTS `follow` (
`user_id` int(10) unsigned NOT NULL,
`board_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`user_id`,`board_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Using default ASC order it only uses index and where, with DESC uses index, where, temporary and filesort.
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE follow ref user_id user_id 4 const 2 100.00 Using index; Using temporary; Using filesort
1 SIMPLE boards eq_ref PRIMARY PRIMARY 4 xxxx.follow.board_id 1 100.00
1 SIMPLE posts ref post_b_id post_b_id 4 xxxx.boards.board_id 3 100.00 Using where
How I can make the query receiving the results in DESC order without filesort and temporary.
UPDATE: I made a new query, no temporary or filesort, but type: index, filtered: 7340.00. Almost as fast as ASC order if the posts are at the end of the table, but slow if the posts that is searching are at the beginning. So seems better but it's not enough.
SELECT posts.post_id, posts.post_b_id, posts.post_title, posts.post_cont, posts.thumb, posts.post_user, boards.board_title_l, boards.board_title
FROM posts INNER JOIN boards ON posts.post_b_id = boards.board_id
WHERE posts.post_b_id
IN (
SELECT follow.board_id
FROM follow
WHERE follow.user_id = 1
)
ORDER BY posts.post_id DESC
LIMIT 10
Explain:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY posts index post_b_id PRIMARY 8 NULL 10 7340.00 Using where
1 PRIMARY boards eq_ref PRIMARY PRIMARY 4 xxxx.posts.post_b_id 1 100.00
2 DEPENDENT SUBQUERY follow eq_ref user_id user_id 8 const,func 1 100.00 Using index
UPDATE: Explain for the query from dened's answer:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2>ALL NULL NULL NULL NULL 10 100.00
1 PRIMARY posts eq_ref PRIMARY,post_b_id PRIMARY 4 sq.post_id 1 100.00
1 PRIMARY boards eq_ref PRIMARY PRIMARY 4 xxxx.posts.post_b_id 1 100.00
2 DERIVED follow ref PRIMARY PRIMARY 4 1 100.00 Using index; Using temporary; Using filesort
2 DERIVED posts ref post_b_id post_b_id 4 xxxx.follow.board_id 6 100.00 Using index
Times:
Original query no order (ASC): 0.187500 seconds
Original query DESC: 2.812500 seconds
Second query posts at the end (DESC): 0.218750 seconds
Second query posts at the beginning (DESC): 3.293750 seconds
dened's query DESC: 0.421875 seconds
dened's query no order (ASC): 0.323750 seconds
Interesting note, if I add ORDER BY ASC is as slow as DESC.
Alter the table order will be a god way, but as I said in the comments I wasn't able to do that.
You can help MySQL optimizer by moving all the filtering work to a subquery that accesses only indices (manipulating indices is usually much faster than manipulating other data), and fetching rest of the data in the outermost query:
SELECT posts.post_id,
posts.post_b_id,
posts.post_title,
posts.post_cont,
posts.thumb,
posts.post_user,
boards.board_title_l,
boards.board_title
FROM (SELECT post_id
FROM posts
JOIN follow
ON posts.post_b_id = follow.board_id
WHERE follow.user_id = 1
ORDER BY post_id DESC
LIMIT 10) sq
JOIN posts
ON posts.post_id = sq.post_id
JOIN boards
ON boards.board_id = posts.post_b_id
Note that I omit ORDER BY posts.post_id DESC from the outer query, because it is usually faster to sort the final result in your code rather than sorting using a MySQL query (MySQL often uses filesort for that).
P.S. You can replace the unique key in the follow table with a primary key.
Increasing the sort_buffer_size parameter will increase the amount of memory MySQL uses before resorting to a temporary disk file and should help considerably.

MySQL Indexes for extremely slow queries

The following query, regardless of environment, takes more than 30 seconds to compute.
SELECT COUNT( r.response_answer )
FROM response r
INNER JOIN (
SELECT G.question_id
FROM question G
INNER JOIN answer_group AG ON G.answer_group_id = AG.answer_group_id
WHERE AG.answer_group_stat = 'statistic'
) AS q ON r.question_id = q.question_id
INNER JOIN org_survey os ON os.org_survey_code = r.org_survey_code
WHERE os.survey_id =42
AND r.response_answer = 5
AND DATEDIFF( NOW( ) , r.added_dt ) <1000000
AND r.uuid IS NOT NULL
When I explain the query,
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 1087
1 PRIMARY r ref question_id,org_survey_code,code_question,uuid,uor question_id 4 q.question_id 1545 Using where
1 PRIMARY os eq_ref org_survey_code,survey_id,org_survey_code_2 org_survey_code 12 survey_2.r.org_survey_code 1 Using where
2 DERIVED G ALL agid NULL NULL NULL 1680
2 DERIVED AG eq_ref PRIMARY PRIMARY 1 survey_2.G.answer_group_id 1 Using where
I have a very basic knowledge of indexing, but I have tried nearly every combination I can think of and cannot seem to improve the speed of this query. The responses table is right around 2 million rows, question is about 1500 rows, answer_group is about 50, and org_survey is about 8,000.
Here is the basic structure for each:
CREATE TABLE `response` (
`response_id` int(10) unsigned NOT NULL auto_increment,
`response_answer` text NOT NULL,
`question_id` int(10) unsigned NOT NULL default '0',
`org_survey_code` varchar(7) NOT NULL,
`uuid` varchar(40) default NULL,
`added_dt` datetime default NULL,
PRIMARY KEY (`response_id`),
KEY `question_id` (`question_id`),
KEY `org_survey_code` (`org_survey_code`),
KEY `code_question` (`org_survey_code`,`question_id`),
KEY `IDX_ADDED_DT` (`added_dt`),
KEY `uuid` (`uuid`),
KEY `response_answer` (`response_answer`(1)),
KEY `response_question` (`response_answer`(1),`question_id`),
) ENGINE=MyISAM AUTO_INCREMENT=2298109 DEFAULT CHARSET=latin1
CREATE TABLE `question` (
`question_id` int(10) unsigned NOT NULL auto_increment,
`question_text` varchar(250) NOT NULL default '',
`question_group` varchar(250) default NULL,
`question_position` tinyint(3) unsigned NOT NULL default '0',
`survey_id` tinyint(3) unsigned NOT NULL default '0',
`answer_group_id` mediumint(8) unsigned NOT NULL default '0',
`seq_id` int(11) NOT NULL default '0',
PRIMARY KEY (`question_id`),
KEY `question_group` (`question_group`(10)),
KEY `survey_id` (`survey_id`),
KEY `agid` (`answer_group_id`)
) ENGINE=MyISAM AUTO_INCREMENT=1860 DEFAULT CHARSET=latin1
CREATE TABLE `org_survey` (
`org_survey_id` int(11) NOT NULL auto_increment,
`org_survey_code` varchar(10) NOT NULL default '',
`org_id` int(11) NOT NULL default '0',
`org_manager_id` int(11) NOT NULL default '0',
`org_url_id` int(11) default '0',
`division_id` int(11) default '0',
`sector_id` int(11) default NULL,
`survey_id` int(11) NOT NULL default '0',
`process_batch` tinyint(4) default '0',
`added_dt` datetime default NULL,
PRIMARY KEY (`org_survey_id`),
UNIQUE KEY `org_survey_code` (`org_survey_code`),
KEY `org_id` (`org_id`),
KEY `survey_id` (`survey_id`),
KEY `org_survey_code_2` (`org_survey_code`,`total_taken`),
KEY `org_manager_id` (`org_manager_id`),
KEY `sector_id` (`sector_id`)
) ENGINE=MyISAM AUTO_INCREMENT=9268 DEFAULT CHARSET=latin1
CREATE TABLE `answer_group` (
`answer_group_id` tinyint(3) unsigned NOT NULL auto_increment,
`answer_group_name` varchar(50) NOT NULL default '',
`answer_group_type` varchar(20) NOT NULL default '',
`answer_group_stat` varchar(20) NOT NULL default 'demographic',
PRIMARY KEY (`answer_group_id`)
) ENGINE=MyISAM AUTO_INCREMENT=53 DEFAULT CHARSET=latin1
I know there are small things I can probably do to improve the efficiency of the database, such as reducing the size of integers where it's unnecessary. However, those are fairly trivial considering the ridiculous time it takes just to produce a result here. How can I properly index these tables, based on what explain has shown me? It seems that I have tried a large variety of combinations to no avail. Also, is there anything else that anyone can see that will optimize the table and reduce the query? I need it to be computed in less than a second. Thanks in advance!
1.If you want the index of r.added_dt to be used, instead of:
DATEDIFF(NOW(), r.added_dt) < 1000000
use:
CURDATE() - INTERVAL 1000000 DAY < r.added_dt
Anyway, the above condition is checking if added_at is a million days old or not. Do you really store so old dates? If not, you can simply remove this condition.
If you want this condition, an index on added_at would help a lot. Your query as it is now, checks all rows for this condition, calling the DATEDIFF() function as many times as the rows of the response table.
2.Since r.response_answer cannot be NULL, instead of:
SELECT COUNT( r.response_answer )
use:
SELECT COUNT( * )
COUNT(*) is faster than COUNT(field).
3.Two of the three fields that you use for joining tables have different datatypes:
ON question . answer_group_id
= answer_group . answer_group_id
CREATE TABLE question (
...
answer_group_id mediumint(8) ..., <--- mediumint
CREATE TABLE answer_group (
answer_group_id` tinyint(3) ..., <--- tinyint
-------------------------------
ON org_survey . org_survey_code
= response . org_survey_code
CREATE TABLE response (
...
org_survey_code varchar(7) NOT NULL, <--- 7
CREATE TABLE org_survey (
...
org_survey_code varchar(10) NOT NULL default '', <--- 10
Datatype mediumint is not the same as tinyint and the same goes for varchar(7) and varchar(10). When they are used for join, MySQL has to lose time doing conversion from one type to another. Convert one of them so they have identical datatypes. This is not the main issue of the query but this change will also help all other queries that use these joins.
And after making this change do a 'Analyze Table ' for the table. It will help mysql making better execution plans.
You have a response_answer = 5 condition, where response_answer is text. It's not an error, but it's better to use response_answer = '5' (the conversion of 5 to '5' will be done by MySQL anyway, if you don't do that).
Real issue is that you don't have a compound index on the 3 fields that are used in the WHERE conditions. Try adding this one:
ALTER TABLE response
ADD INDEX ind_u1_ra1_aa
(uuid(1), response_answer(1), added_at) ;
(this may take a while as your table is not small)
Can you try the following query? I've removed the sub-query from your original one. This may let the optimiser produce a better execution plan.
SELECT COUNT(r.response_answer)
FROM response r
INNER JOIN question q ON r.question_id = q.question_id
INNER JOIN answer_group ag ON q.answer_group_id = ag.answer_group_id
INNER JOIN org_survey os ON os.org_survey_code = r.org_survey_code
WHERE
ag.answer_group_stat = 'statistic'
AND os.survey_id = 42
AND r.response_answer = 5
AND DATEDIFF(NOW(), r.added_dt) < 1000000
AND r.uuid IS NOT NULL