MySQL Indexes for extremely slow queries - mysql

The following query, regardless of environment, takes more than 30 seconds to compute.
SELECT COUNT( r.response_answer )
FROM response r
INNER JOIN (
SELECT G.question_id
FROM question G
INNER JOIN answer_group AG ON G.answer_group_id = AG.answer_group_id
WHERE AG.answer_group_stat = 'statistic'
) AS q ON r.question_id = q.question_id
INNER JOIN org_survey os ON os.org_survey_code = r.org_survey_code
WHERE os.survey_id =42
AND r.response_answer = 5
AND DATEDIFF( NOW( ) , r.added_dt ) <1000000
AND r.uuid IS NOT NULL
When I explain the query,
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 1087
1 PRIMARY r ref question_id,org_survey_code,code_question,uuid,uor question_id 4 q.question_id 1545 Using where
1 PRIMARY os eq_ref org_survey_code,survey_id,org_survey_code_2 org_survey_code 12 survey_2.r.org_survey_code 1 Using where
2 DERIVED G ALL agid NULL NULL NULL 1680
2 DERIVED AG eq_ref PRIMARY PRIMARY 1 survey_2.G.answer_group_id 1 Using where
I have a very basic knowledge of indexing, but I have tried nearly every combination I can think of and cannot seem to improve the speed of this query. The responses table is right around 2 million rows, question is about 1500 rows, answer_group is about 50, and org_survey is about 8,000.
Here is the basic structure for each:
CREATE TABLE `response` (
`response_id` int(10) unsigned NOT NULL auto_increment,
`response_answer` text NOT NULL,
`question_id` int(10) unsigned NOT NULL default '0',
`org_survey_code` varchar(7) NOT NULL,
`uuid` varchar(40) default NULL,
`added_dt` datetime default NULL,
PRIMARY KEY (`response_id`),
KEY `question_id` (`question_id`),
KEY `org_survey_code` (`org_survey_code`),
KEY `code_question` (`org_survey_code`,`question_id`),
KEY `IDX_ADDED_DT` (`added_dt`),
KEY `uuid` (`uuid`),
KEY `response_answer` (`response_answer`(1)),
KEY `response_question` (`response_answer`(1),`question_id`),
) ENGINE=MyISAM AUTO_INCREMENT=2298109 DEFAULT CHARSET=latin1
CREATE TABLE `question` (
`question_id` int(10) unsigned NOT NULL auto_increment,
`question_text` varchar(250) NOT NULL default '',
`question_group` varchar(250) default NULL,
`question_position` tinyint(3) unsigned NOT NULL default '0',
`survey_id` tinyint(3) unsigned NOT NULL default '0',
`answer_group_id` mediumint(8) unsigned NOT NULL default '0',
`seq_id` int(11) NOT NULL default '0',
PRIMARY KEY (`question_id`),
KEY `question_group` (`question_group`(10)),
KEY `survey_id` (`survey_id`),
KEY `agid` (`answer_group_id`)
) ENGINE=MyISAM AUTO_INCREMENT=1860 DEFAULT CHARSET=latin1
CREATE TABLE `org_survey` (
`org_survey_id` int(11) NOT NULL auto_increment,
`org_survey_code` varchar(10) NOT NULL default '',
`org_id` int(11) NOT NULL default '0',
`org_manager_id` int(11) NOT NULL default '0',
`org_url_id` int(11) default '0',
`division_id` int(11) default '0',
`sector_id` int(11) default NULL,
`survey_id` int(11) NOT NULL default '0',
`process_batch` tinyint(4) default '0',
`added_dt` datetime default NULL,
PRIMARY KEY (`org_survey_id`),
UNIQUE KEY `org_survey_code` (`org_survey_code`),
KEY `org_id` (`org_id`),
KEY `survey_id` (`survey_id`),
KEY `org_survey_code_2` (`org_survey_code`,`total_taken`),
KEY `org_manager_id` (`org_manager_id`),
KEY `sector_id` (`sector_id`)
) ENGINE=MyISAM AUTO_INCREMENT=9268 DEFAULT CHARSET=latin1
CREATE TABLE `answer_group` (
`answer_group_id` tinyint(3) unsigned NOT NULL auto_increment,
`answer_group_name` varchar(50) NOT NULL default '',
`answer_group_type` varchar(20) NOT NULL default '',
`answer_group_stat` varchar(20) NOT NULL default 'demographic',
PRIMARY KEY (`answer_group_id`)
) ENGINE=MyISAM AUTO_INCREMENT=53 DEFAULT CHARSET=latin1
I know there are small things I can probably do to improve the efficiency of the database, such as reducing the size of integers where it's unnecessary. However, those are fairly trivial considering the ridiculous time it takes just to produce a result here. How can I properly index these tables, based on what explain has shown me? It seems that I have tried a large variety of combinations to no avail. Also, is there anything else that anyone can see that will optimize the table and reduce the query? I need it to be computed in less than a second. Thanks in advance!

1.If you want the index of r.added_dt to be used, instead of:
DATEDIFF(NOW(), r.added_dt) < 1000000
use:
CURDATE() - INTERVAL 1000000 DAY < r.added_dt
Anyway, the above condition is checking if added_at is a million days old or not. Do you really store so old dates? If not, you can simply remove this condition.
If you want this condition, an index on added_at would help a lot. Your query as it is now, checks all rows for this condition, calling the DATEDIFF() function as many times as the rows of the response table.
2.Since r.response_answer cannot be NULL, instead of:
SELECT COUNT( r.response_answer )
use:
SELECT COUNT( * )
COUNT(*) is faster than COUNT(field).
3.Two of the three fields that you use for joining tables have different datatypes:
ON question . answer_group_id
= answer_group . answer_group_id
CREATE TABLE question (
...
answer_group_id mediumint(8) ..., <--- mediumint
CREATE TABLE answer_group (
answer_group_id` tinyint(3) ..., <--- tinyint
-------------------------------
ON org_survey . org_survey_code
= response . org_survey_code
CREATE TABLE response (
...
org_survey_code varchar(7) NOT NULL, <--- 7
CREATE TABLE org_survey (
...
org_survey_code varchar(10) NOT NULL default '', <--- 10
Datatype mediumint is not the same as tinyint and the same goes for varchar(7) and varchar(10). When they are used for join, MySQL has to lose time doing conversion from one type to another. Convert one of them so they have identical datatypes. This is not the main issue of the query but this change will also help all other queries that use these joins.
And after making this change do a 'Analyze Table ' for the table. It will help mysql making better execution plans.
You have a response_answer = 5 condition, where response_answer is text. It's not an error, but it's better to use response_answer = '5' (the conversion of 5 to '5' will be done by MySQL anyway, if you don't do that).
Real issue is that you don't have a compound index on the 3 fields that are used in the WHERE conditions. Try adding this one:
ALTER TABLE response
ADD INDEX ind_u1_ra1_aa
(uuid(1), response_answer(1), added_at) ;
(this may take a while as your table is not small)

Can you try the following query? I've removed the sub-query from your original one. This may let the optimiser produce a better execution plan.
SELECT COUNT(r.response_answer)
FROM response r
INNER JOIN question q ON r.question_id = q.question_id
INNER JOIN answer_group ag ON q.answer_group_id = ag.answer_group_id
INNER JOIN org_survey os ON os.org_survey_code = r.org_survey_code
WHERE
ag.answer_group_stat = 'statistic'
AND os.survey_id = 42
AND r.response_answer = 5
AND DATEDIFF(NOW(), r.added_dt) < 1000000
AND r.uuid IS NOT NULL

Related

Efficient select on huge table of ranges

I have a mysql table which contains the fields: rangeFrom and rangeTo.
I want to request rows with a condition like: rangeFrom >= ? AND rangeTo <=? within a join.
EXPLAIN SELECT *
FROM Version
JOIN Contract FORCE INDEX FOR JOIN (versionRangeFrom)
ON Version.id >= Contract.versionRangeFrom
AND Version.id <= Contract.versionRangeTo
WHERE Version.completedAt = '2016-06-06 10:00:01';
Which mysql explains like this:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE Version ref PRIMARY,completedAt completedAt 6 const 1 NULL
1 SIMPLE Contract ALL versionRangeFrom NULL NULL NULL 640744 Range checked for each record (index map: 0x8)
So it has to work though 640744 rows which takes about 1-2 seconds.
However inserting the version id in the queryworks fine
EXPLAIN SELECT *
FROM Contract
WHERE 5 >= Contract.versionRangeFrom AND 5 <= Contract.versionRangeTo;
This is then explained like this:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE Contract range versionRangeFrom versionRangeFrom 4 NULL 534 Using index condition; Using where
So in this case mysql only goes though 534 rows and that only takes about 30ms.
So how do I prepare for such a range check correctly. It seems that mysql is unable to use Indexes in those cases. I can work around it by using 2 queries but i'd rather have one.
Here more schemas:
CREATE TABLE `Version` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`completedAt` datetime DEFAULT NULL,
`createdAt` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `completedAt` (`completedAt`)
)
CREATE TABLE `Contract` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`contractId` bigint(20) unsigned NOT NULL,
`startAt` bigint(20) NOT NULL DEFAULT '0',
`endAt` bigint(20) NOT NULL DEFAULT '0',
`tradeStartAt` bigint(20) NOT NULL DEFAULT '0',
`tradeEndAt` bigint(20) NOT NULL DEFAULT '0',
`latestAiId` bigint(20) NOT NULL DEFAULT '0',
`type` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`daPreis` int(11) NOT NULL DEFAULT '0',
`lastTradePreis` int(11) NOT NULL DEFAULT '0',
`lastTradeVol` int(11) NOT NULL DEFAULT '0',
`VWAID` double NOT NULL DEFAULT '0',
`versionRangeFrom` int(10) unsigned NOT NULL,
`versionRangeTo` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `tradeStartAt` (`tradeStartAt`),
KEY `contractId` (`contractId`),
KEY `versionRangeFrom` (`versionRangeFrom`)
)
A different value than '5' would not need to look at only 534 rows.
The problem (x < from AND x > to)is non-trivial, and for which there is no simple answer.
Shrinking the table size would help a tiny bit. Don't use BIGINT (8 bytes) when some smaller datatype would suffice.
Perhaps the only real solution involves a major revamp of the schema and code. See my blog.
Edit
In certain situations, this subquery may be efficiently used to find the row in question:
( SELECT Contract.id FROM ...
WHERE Version.id >= Contract.versionRangeFrom
ORDER BY versionRangeFrom
LIMIT 1 )

Slow query with joined derived tables

I have a few queries on a "custom dashboard" of my application, and one of them is taking 10-12 seconds to execute. Using EXPLAIN I can see why it's slow, but I don't know what to do about it. Here is the query:
SELECT person.PersonID,FullName,Furigana,qualdate FROM person
INNER JOIN (
SELECT pq.PersonID,MAX(ContactDate) AS qualdate FROM person pq
INNER JOIN contact cq ON pq.PersonID=cq.PersonID
WHERE cq.ContactTypeID IN (22,26,45) GROUP BY pq.PersonID
) qual ON person.PersonID=qual.PersonID
LEFT OUTER JOIN (
SELECT pe.personID,MAX(ContactDate) AS elimdate FROM person pe
INNER JOIN contact ce ON pe.PersonID=ce.PersonID WHERE ce.ContactTypeID IN (25,31,30,41,23,42,2,33,35,29,12)
GROUP BY pe.PersonID
) elim ON qual.PersonID=elim.PersonID
LEFT OUTER JOIN (
SELECT po.personID FROM person po
INNER JOIN percat pc ON po.PersonID=pc.PersonID WHERE pc.CategoryID=38
) overseas ON qual.PersonID=overseas.PersonID
WHERE (elimdate IS NULL OR qualdate > elimdate)
AND qualdate < CURDATE()-INTERVAL 7 DAY
AND overseas.PersonID IS NULL
ORDER BY qualdate
And here is the EXPLAIN result:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 5447 Using where; Using temporary; Using filesort
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 5565 Using where
1 PRIMARY <derived4> ALL NULL NULL NULL NULL 9 Using where; Not exists
1 PRIMARY person eq_ref PRIMARY PRIMARY 4 qual.PersonID 1
4 DERIVED pc ref PRIMARY,CategoryID CategoryID 4 8
4 DERIVED po eq_ref PRIMARY PRIMARY 4 kizuna_misa.pc.PersonID 1 Using index
3 DERIVED pe index PRIMARY PRIMARY 4 NULL 5964 Using index
3 DERIVED ce ref PersonID,ContactTypeID PersonID 4 kizuna_misa.pe.PersonID 1 Using where
2 DERIVED pq index PRIMARY PRIMARY 4 NULL 5964 Using index
2 DERIVED cq ref PersonID,ContactTypeID PersonID 4 kizuna_misa.pq.PersonID 1 Using where
I'm sure the first line of the EXPLAIN reveals the problem (comparing with similar queries, it appears that the second line isn't too slow), but I don't know how to fix it. I already have indexes on every column that appears in the joins, but since the tables are <derived2> etc., I guess indexes are irrelevant.
The objective (since it's probably not obvious to someone unfamiliar with my application and schema) is a followup tickler list - if one of the #22/26/45 contacts has occurred but nothing has been done in response (either one of several other contacts or designating by a category assignment that the person is overseas), then the person should appear in the list for followup after waiting a week. Subqueries are easier for me to write and understand than these messy joins, but I can't check the sequence of dates (and subqueries are often slow, also).
EDIT (in response to Rick James):
MySQL version is 5.0.95 (yeah, I know...). And here is SHOW CREATE TABLE for the three tables involved, even though most of the fields in person are irrelevant:
CREATE TABLE `contact` (
`ContactID` int(11) unsigned NOT NULL auto_increment,
`PersonID` int(11) unsigned NOT NULL default '0',
`ContactTypeID` int(11) unsigned NOT NULL default '0',
`ContactDate` date NOT NULL default '0000-00-00',
`Description` text,
PRIMARY KEY (`ContactID`),
KEY `ContactDate` (`ContactDate`),
KEY `PersonID` (`PersonID`),
KEY `ContactTypeID` (`ContactTypeID`)
) ENGINE=MyISAM AUTO_INCREMENT=16901 DEFAULT CHARSET=utf8
CREATE TABLE `percat` (
`PersonID` int(11) unsigned NOT NULL default '0',
`CategoryID` int(11) unsigned NOT NULL default '0',
PRIMARY KEY (`PersonID`,`CategoryID`),
KEY `CategoryID` (`CategoryID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
CREATE TABLE `person` (
`PersonID` int(11) unsigned NOT NULL auto_increment,
`FullName` varchar(100) NOT NULL default '',
`Furigana` varchar(100) NOT NULL default '',
`Sex` enum('','M','F') character set ascii NOT NULL default '',
`HouseholdID` int(11) unsigned NOT NULL default '0',
`Relation` varchar(6) character set ascii NOT NULL default '',
`Title` varchar(6) NOT NULL default '',
`CellPhone` varchar(30) character set ascii NOT NULL default '',
`Email` varchar(70) character set ascii NOT NULL default '',
`Birthdate` date NOT NULL default '0000-00-00',
`Country` varchar(30) NOT NULL default '',
`URL` varchar(150) NOT NULL default '',
`Organization` tinyint(1) NOT NULL default '0',
`Remarks` text NOT NULL,
`Photo` tinyint(1) NOT NULL default '0',
`UpdDate` date NOT NULL default '0000-00-00',
PRIMARY KEY (`PersonID`),
KEY `Furigana` (`Furigana`),
KEY `FullName` (`FullName`),
KEY `Email` (`Email`),
KEY `Organization` (`Organization`,`Furigana`)
) ENGINE=MyISAM AUTO_INCREMENT=6063 DEFAULT CHARSET=utf8
Attempted suggestion:
I tried to implement Rick James's suggestion of putting the subselects in the field list (I didn't even know that was possible), like this:
SELECT
p.PersonID,
FullName,
Furigana,
(SELECT MAX(ContactDate) FROM contact cq
WHERE cq.PersonID=p.PersonID
AND cq.ContactTypeID IN (22,26,45))
AS qualdate,
(SELECT MAX(ContactDate) FROM contact ce
WHERE ce.PersonID=p.PersonID
AND ce.ContactTypeID IN (25,31,30,41,23,42,2,33,35,29,12))
AS elimdate
FROM person p
WHERE (elimdate IS NULL OR qualdate > elimdate)
AND qualdate < CURDATE()-INTERVAL 7 DAY
AND NOT EXISTS (SELECT * FROM percat WHERE CategoryID=38 AND percat.PersonID=p.PersonID)
ORDER BY qualdate
But it complains: #1054 - Unknown column 'elimdate' in 'where clause' According to the docs, WHERE clauses are interpreted before field lists, so this approach isn't going to work.
You have an interesting query. I am not sure what the best solution is. Here are two guesses:
Plan A
INDEX(qualdate)
may help. Please provide SHOW CREATE TABLE.
This construct optimizes poorly:
FROM ( SELECT ... )
JOIN ( SELECT ... )
In your case, overseas should probably turned into a JOIN, not a subselect. And the other two should probably be turned into a different flavor of dependent subquery:
SELECT ...,
( SELECT MAX(...) ... ) AS qualdate,
( SELECT MAX(...) ... ) AS elimdate
FROM ...
What version of MySQL are you running?
Plan B
If practical, fold these into the subqueries so that they generate fewer rows, thereby leading to less effort at the outer query. (One per subquery)
elimdate IS NOT NULL
qualdate < CURDATE()-INTERVAL 7 DAY
overseas.PersonID IS NOT NULL
Perhaps the NULL tests apply to LEFT and this suggestion may not apply.

Why update mysql query run slow

I have a SQL Tabe "STG_S_CUST" which contains a lot of rows (up to 1.5 million) and another table "S_CUST" which contains a lot of rows.
when I'm executing the following Update query, it's very slow, it takes too much time.
UPDATE STG_S_CUST AS STG
INNER JOIN S_CUST AS ST ON STG.SRC_NM=ST.SRC_NM
AND STG.SRC_KEY = ST.SRC_KEY
SET UPDATE_IND = 1,
STG.S_ID = ST.S_ID,
STG.M_ID = ST.M_ID
WHERE STG.PROCESSED_IND = 0
The problem is, that I get a Timeout-Exception unable to execute SQL.
EXPLAIN UPDATE STG_S_CUST AS STG
INNER JOIN S_CUST AS ST ON STG.SRC_NM=ST.SRC_NM
AND STG.SRC_KEY = ST.SRC_KEY
SET UPDATE_IND = 1,
STG.S_ID = ST.S_ID,
STG.M_ID = ST.M_ID
WHERE STG.PROCESSED_IND = 0
Result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ST ALL NULL NULL NULL NULL 10479 NULL
1 SIMPLE STG ALL NULL NULL NULL NULL 159334 Using where; Using join buffer (Block Nested Loop)
here's an abbreviated version of the create tables
STG_S_CUST :
CREATE TABLE `STG_S_CUST` (
`STG_ID` int(14) NOT NULL AUTO_INCREMENT,
`STG_DATE` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`SRC_KEY` varchar(100) DEFAULT NULL,
`SRC_NM` varchar(20) DEFAULT NULL,
`M_ID` int(14) DEFAULT NULL,
`S_ID` int(14) DEFAULT NULL,
`PROCESSED_IND` int(1) NOT NULL DEFAULT '0',
`THREAD_ID` int(3) DEFAULT NULL,
`UPDATE_IND` int(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`STG_ID`)
) ENGINE=InnoDB AUTO_INCREMENT=171998 DEFAULT CHARSET=latin1
S_CUST :
CREATE TABLE `S_CUST` (
`S_ID` int(14) NOT NULL AUTO_INCREMENT,
`SRC_KEY` varchar(100) DEFAULT NULL,
`SRC_NM` varchar(20) DEFAULT NULL,
`M_ID` int(14) DEFAULT NULL,
PRIMARY KEY (`S_ID`)
) ENGINE=InnoDB AUTO_INCREMENT=10803 DEFAULT CHARSET=latin1
Does anyone have any ideas why this would be so slow and how to speed it up ?
Could anyone help me here for optimization?
You need some indexing for making the select part faster while doing the join update, start with adding the following indexes
alter table STG_S_CUST add index PROCESSED_IND_idx(PROCESSED_IND);
alter table STG_S_CUST add index SRC_idx(SRC_NM,SRC_KEY);
alter table S_CUST add index SRC_NM_idx(SRC_NM)
Take a backup of the tables first before applying the indexes

Trying to optimize MySQL query with LEFT OUTER JOIN

I've this query, which works fine except it takes a long while (7 seconds, with 40k records in the jobs table, and 700k in the wq table).
I tried an EXPLAIN and it says its looking at all the records in the job table, and not using any of the indexes.
I don't know how to tell MySQL that it should use the jobs.status field to filter the the records before looking up the wq table.
The objective of this, is to get all the records from jobs that have a status != 331, and also any other job which has a wq status of (101, 111, 151).
Query:
SELECT jobs.*
FROM jobs
LEFT OUTER JOIN wq ON (wq.job = jobs.id AND jobs.status IN (341, 331) AND wq.status IN (101, 111, 151))
WHERE ((wq.info is not NULL) or (jobs.status != 331 and ack = 0))
EXPLAIN output:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE jobs ALL ack,status,status_ack NULL NULL NULL 38111 Using filesort
1 SIMPLE wq ref PRIMARY,job,status PRIMARY 4 cts.jobs.id 20 Using where
Table definitions:
CREATE TABLE jobs ( id int(10) NOT NULL AUTO_INCREMENT,
comment varchar(100) NOT NULL DEFAULT '',
profile varchar(60) NOT NULL DEFAULT '',
start_at int(10) NOT NULL DEFAULT '0',
data text NOT NULL,
status int(10) NOT NULL DEFAULT '0',
info varchar(200) NOT NULL DEFAULT '',
finish int(10) NOT NULL DEFAULT '0',
priority int(5) NOT NULL DEFAULT '0',
ack tinyint(4) NOT NULL DEFAULT '0',
PRIMARY KEY (id),
KEY start_at (start_at),
KEY status (status),
KEY status_ack (status,
ack) ) ENGINE=MyISAM AUTO_INCREMENT=2037530 DEFAULT CHARSET=latin1;
CREATE TABLE wq ( job int(10) NOT NULL DEFAULT '0',
process varchar(60) NOT NULL DEFAULT '',
step varchar(60) NOT NULL DEFAULT '',
status int(10) NOT NULL DEFAULT '0',
run_at int(10) NOT NULL DEFAULT '0',
original_run_at int(10) NOT NULL DEFAULT '0',
info varchar(200) NOT NULL DEFAULT '',
pos int(10) NOT NULL DEFAULT '0',
changed_at int(10) NOT NULL DEFAULT '0',
file varchar(60) NOT NULL DEFAULT '',
PRIMARY KEY (job,
process,
step,
file),
KEY job (job),
KEY status (status) ) ENGINE=MyISAM DEFAULT CHARSET=latin1
Unfortunately mysql (and perhaps any dbms) cannot optimize expressions like jobs.status != 331 and ack = 0 because B-Tree is not a structure that allows to find fast anything that is-not-equal-to-a-constant-value. Thus you'll always get a fullscan.
If there were some better condition like jobs.status = 331 and ack = 0 (note on the fact that i've changed != to =) then it would be an advice to speed up this query:
split the query into 2, joined by UNION ALL
replace in one query LEFT JOIN to INNER JOIN (in the one that implies that wq.info is not NULL)

Why does MySQL stop using an index for a join when I select non-indexed fields in the field list

I have the following two tables:
CREATE TABLE `temporal_expressions` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`dated_obj_type` varchar(255) DEFAULT NULL,
`dated_obj_id` int(11) DEFAULT NULL,
`start_date` datetime DEFAULT NULL,
`end_date` datetime DEFAULT NULL,
`start_time` int(11) DEFAULT NULL,
`end_time` int(11) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`lock_version` int(11) NOT NULL DEFAULT '0',
`wday` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `te_search` (`dated_obj_type`,`dated_obj_id`,`start_date`,`end_date`),
KEY `te_calendar` (`dated_obj_type`,`dated_obj_id`,`start_date`,`end_date`,`start_time`,`end_time`),
KEY `te_search_wday` (`dated_obj_type`,`dated_obj_id`,`start_date`,`end_date`,`wday`),
KEY `te_calendar_wday` (`dated_obj_type`,`dated_obj_id`,`start_date`,`end_date`,`start_time`,`end_time`,`wday`),
KEY `te_index` (`wday`,`dated_obj_type`,`start_date`,`end_date`,`start_time`,`end_time`,`dated_obj_id`)
) ENGINE=InnoDB AUTO_INCREMENT=8162445 DEFAULT CHARSET=latin1
CREATE TABLE `asset_blocks` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`block_type` int(11) DEFAULT '0',
`spaces_left` int(11) DEFAULT NULL,
`provider_note` varchar(255) DEFAULT NULL,
`extra_data` text,
`lock_version` int(11) DEFAULT '0',
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`type` varchar(255) DEFAULT NULL,
`service_provider_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `type` (`type`,`id`),
KEY `service_provider_id` (`service_provider_id`,`type`,`id`),
) ENGINE=InnoDB AUTO_INCREMENT=516867 DEFAULT CHARSET=latin1
If I run explain on this query (note that I am only selecting fields in the te_calendar_wday index from temporal_expressions) it uses the index for the join as expected
EXPLAIN SELECT asset_blocks.*, temporal_expressions.id,
temporal_expressions.dated_obj_type, temporal_expressions.dated_obj_id,
temporal_expressions.start_date, temporal_expressions.end_date,
temporal_expressions.start_time
FROM `asset_blocks`
LEFT OUTER JOIN `temporal_expressions`
ON `temporal_expressions`.dated_obj_id = `asset_blocks`.id
AND `temporal_expressions`.dated_obj_type = 'AssetBlock'
WHERE ( temporal_expressions.start_date <= '2010-11-25'
AND temporal_expressions.end_date >= '2010-11-01'
AND temporal_expressions.start_time < 1000 AND temporal_expressions.end_time > 1200
AND temporal_expressions.wday IN (1,2,3,4,5,6)
AND asset_blocks.id IN (1,2,3,4,5,6,7,8,9) )
1 SIMPLE temporal_expressions range te_search,te_calendar,te_search_wday,te_calendar_wday,te_index te_calendar_wday 272 NULL 9 Using where; Using index
1 SIMPLE asset_blocks eq_ref PRIMARY PRIMARY 4 lb_production.temporal_expressions.dated_obj_id 1
However, if I run this query (note that I have added a non-indexed field to the field list) it no longer uses the index (it uses a join buffer). Is this intentional or am I missing something?
EXPLAIN SELECT asset_blocks.*, temporal_expressions.id,
temporal_expressions.dated_obj_type, temporal_expressions.dated_obj_id,
temporal_expressions.start_date, temporal_expressions.end_date,
temporal_expressions.start_time, temporal_expressions.created_at
FROM `asset_blocks`
LEFT OUTER JOIN `temporal_expressions`
ON `temporal_expressions`.dated_obj_id = `asset_blocks`.id
AND `temporal_expressions`.dated_obj_type = 'AssetBlock'
WHERE ( temporal_expressions.start_date <= '2010-11-25'
AND temporal_expressions.end_date >= '2010-11-01'
AND temporal_expressions.start_time < 1000 AND temporal_expressions.end_time > 1200
AND temporal_expressions.wday IN (1,2,3,4,5,6)
AND asset_blocks.id IN (1,2,3,4,5,6,7,8,9) )
1 SIMPLE asset_blocks range PRIMARY PRIMARY 4 NULL 9 Using where
1 SIMPLE temporal_expressions range te_search,te_calendar,te_search_wday,te_calendar_wday,new_te_index te_search 272 NULL 9 Using where; Using join buffer
I cannot be sure if this is the case here, but:
If you select only indexed fields, MySQL can answer the whole query out of the index and does not even load the table data file.
If you select a field that is not indexed, it has to load the table data.
When making its execution plan, in certain cases (see comment) MySQL decides to do a full table scan although an index is present. This is because it's much quicker to read all data blindly than to look up every entry in the index and then read the data.