Slow query with joined derived tables - mysql

I have a few queries on a "custom dashboard" of my application, and one of them is taking 10-12 seconds to execute. Using EXPLAIN I can see why it's slow, but I don't know what to do about it. Here is the query:
SELECT person.PersonID,FullName,Furigana,qualdate FROM person
INNER JOIN (
SELECT pq.PersonID,MAX(ContactDate) AS qualdate FROM person pq
INNER JOIN contact cq ON pq.PersonID=cq.PersonID
WHERE cq.ContactTypeID IN (22,26,45) GROUP BY pq.PersonID
) qual ON person.PersonID=qual.PersonID
LEFT OUTER JOIN (
SELECT pe.personID,MAX(ContactDate) AS elimdate FROM person pe
INNER JOIN contact ce ON pe.PersonID=ce.PersonID WHERE ce.ContactTypeID IN (25,31,30,41,23,42,2,33,35,29,12)
GROUP BY pe.PersonID
) elim ON qual.PersonID=elim.PersonID
LEFT OUTER JOIN (
SELECT po.personID FROM person po
INNER JOIN percat pc ON po.PersonID=pc.PersonID WHERE pc.CategoryID=38
) overseas ON qual.PersonID=overseas.PersonID
WHERE (elimdate IS NULL OR qualdate > elimdate)
AND qualdate < CURDATE()-INTERVAL 7 DAY
AND overseas.PersonID IS NULL
ORDER BY qualdate
And here is the EXPLAIN result:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 5447 Using where; Using temporary; Using filesort
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 5565 Using where
1 PRIMARY <derived4> ALL NULL NULL NULL NULL 9 Using where; Not exists
1 PRIMARY person eq_ref PRIMARY PRIMARY 4 qual.PersonID 1
4 DERIVED pc ref PRIMARY,CategoryID CategoryID 4 8
4 DERIVED po eq_ref PRIMARY PRIMARY 4 kizuna_misa.pc.PersonID 1 Using index
3 DERIVED pe index PRIMARY PRIMARY 4 NULL 5964 Using index
3 DERIVED ce ref PersonID,ContactTypeID PersonID 4 kizuna_misa.pe.PersonID 1 Using where
2 DERIVED pq index PRIMARY PRIMARY 4 NULL 5964 Using index
2 DERIVED cq ref PersonID,ContactTypeID PersonID 4 kizuna_misa.pq.PersonID 1 Using where
I'm sure the first line of the EXPLAIN reveals the problem (comparing with similar queries, it appears that the second line isn't too slow), but I don't know how to fix it. I already have indexes on every column that appears in the joins, but since the tables are <derived2> etc., I guess indexes are irrelevant.
The objective (since it's probably not obvious to someone unfamiliar with my application and schema) is a followup tickler list - if one of the #22/26/45 contacts has occurred but nothing has been done in response (either one of several other contacts or designating by a category assignment that the person is overseas), then the person should appear in the list for followup after waiting a week. Subqueries are easier for me to write and understand than these messy joins, but I can't check the sequence of dates (and subqueries are often slow, also).
EDIT (in response to Rick James):
MySQL version is 5.0.95 (yeah, I know...). And here is SHOW CREATE TABLE for the three tables involved, even though most of the fields in person are irrelevant:
CREATE TABLE `contact` (
`ContactID` int(11) unsigned NOT NULL auto_increment,
`PersonID` int(11) unsigned NOT NULL default '0',
`ContactTypeID` int(11) unsigned NOT NULL default '0',
`ContactDate` date NOT NULL default '0000-00-00',
`Description` text,
PRIMARY KEY (`ContactID`),
KEY `ContactDate` (`ContactDate`),
KEY `PersonID` (`PersonID`),
KEY `ContactTypeID` (`ContactTypeID`)
) ENGINE=MyISAM AUTO_INCREMENT=16901 DEFAULT CHARSET=utf8
CREATE TABLE `percat` (
`PersonID` int(11) unsigned NOT NULL default '0',
`CategoryID` int(11) unsigned NOT NULL default '0',
PRIMARY KEY (`PersonID`,`CategoryID`),
KEY `CategoryID` (`CategoryID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
CREATE TABLE `person` (
`PersonID` int(11) unsigned NOT NULL auto_increment,
`FullName` varchar(100) NOT NULL default '',
`Furigana` varchar(100) NOT NULL default '',
`Sex` enum('','M','F') character set ascii NOT NULL default '',
`HouseholdID` int(11) unsigned NOT NULL default '0',
`Relation` varchar(6) character set ascii NOT NULL default '',
`Title` varchar(6) NOT NULL default '',
`CellPhone` varchar(30) character set ascii NOT NULL default '',
`Email` varchar(70) character set ascii NOT NULL default '',
`Birthdate` date NOT NULL default '0000-00-00',
`Country` varchar(30) NOT NULL default '',
`URL` varchar(150) NOT NULL default '',
`Organization` tinyint(1) NOT NULL default '0',
`Remarks` text NOT NULL,
`Photo` tinyint(1) NOT NULL default '0',
`UpdDate` date NOT NULL default '0000-00-00',
PRIMARY KEY (`PersonID`),
KEY `Furigana` (`Furigana`),
KEY `FullName` (`FullName`),
KEY `Email` (`Email`),
KEY `Organization` (`Organization`,`Furigana`)
) ENGINE=MyISAM AUTO_INCREMENT=6063 DEFAULT CHARSET=utf8
Attempted suggestion:
I tried to implement Rick James's suggestion of putting the subselects in the field list (I didn't even know that was possible), like this:
SELECT
p.PersonID,
FullName,
Furigana,
(SELECT MAX(ContactDate) FROM contact cq
WHERE cq.PersonID=p.PersonID
AND cq.ContactTypeID IN (22,26,45))
AS qualdate,
(SELECT MAX(ContactDate) FROM contact ce
WHERE ce.PersonID=p.PersonID
AND ce.ContactTypeID IN (25,31,30,41,23,42,2,33,35,29,12))
AS elimdate
FROM person p
WHERE (elimdate IS NULL OR qualdate > elimdate)
AND qualdate < CURDATE()-INTERVAL 7 DAY
AND NOT EXISTS (SELECT * FROM percat WHERE CategoryID=38 AND percat.PersonID=p.PersonID)
ORDER BY qualdate
But it complains: #1054 - Unknown column 'elimdate' in 'where clause' According to the docs, WHERE clauses are interpreted before field lists, so this approach isn't going to work.

You have an interesting query. I am not sure what the best solution is. Here are two guesses:
Plan A
INDEX(qualdate)
may help. Please provide SHOW CREATE TABLE.
This construct optimizes poorly:
FROM ( SELECT ... )
JOIN ( SELECT ... )
In your case, overseas should probably turned into a JOIN, not a subselect. And the other two should probably be turned into a different flavor of dependent subquery:
SELECT ...,
( SELECT MAX(...) ... ) AS qualdate,
( SELECT MAX(...) ... ) AS elimdate
FROM ...
What version of MySQL are you running?
Plan B
If practical, fold these into the subqueries so that they generate fewer rows, thereby leading to less effort at the outer query. (One per subquery)
elimdate IS NOT NULL
qualdate < CURDATE()-INTERVAL 7 DAY
overseas.PersonID IS NOT NULL
Perhaps the NULL tests apply to LEFT and this suggestion may not apply.

Related

Why is subquery join much faster than direct join

I have 2 tables (pages and comments), around 130 000 rows each.
I want to list pages without any comments (foreign key is comments.page_id)
If I execute the normal left outer join, it takes an amazing more than 750 seconds to run. (130k^2 = 17B). Whereas if I execute the same join, but using subqueries for the tables, it takes just 1 second.
Server version: 5.6.44-log - MySQL Community Server (GPL):
Query 1. Normal join, 750+ seconds
SELECT p.id
FROM `pages` AS p
LEFT JOIN `comments` AS c
ON p.id = c.page_id
WHERE c.page_id IS NULL
GROUP BY 1
Query 2. Join with first table as subquery, Too much time
SELECT p.id
FROM (
SELECT id FROM `pages`
) AS p
LEFT JOIN `comments` AS c
ON p.id = c.page_id
WHERE c.page_id IS NULL
GROUP BY 1
Query 3. Join with second table as subquery, 1.6 seconds
SELECT p.id
FROM `pages` AS p
LEFT JOIN (
SELECT * FROM `comments`
) AS c
ON p.id = c.page_id
WHERE c.page_id IS NULL
GROUP BY 1
Query 4. Join with 2 subqueries, 1 second
SELECT p.id
FROM (
SELECT id FROM `pages`
) AS p
LEFT JOIN (
SELECT * FROM `comments`
) AS c
ON p.id = c.page_id
WHERE c.page_id IS NULL
GROUP BY 1
Query 5. Join with 2 subqueries, selecting only 1 column, 0.2 seconds
SELECT p.id
FROM (
SELECT id FROM `pages`
) AS p
LEFT JOIN (
SELECT page_id FROM `comments`
) AS c
ON p.id = c.page_id
WHERE c.page_id IS NULL
GROUP BY 1
Query 6. Too much time
SELECT p.id
FROM `pages` AS p
WHERE NOT EXISTS( SELECT page_id FROM `comments`
WHERE page_id = p.id );;
Now, in MySql version 5.7, all of the above queries take "too much time" to execute.
In MySql 5.7, query 1 and 4 have same explanation:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 SIMPLE p NULL index PRIMARY PRIMARY 4 NULL 147626 100.00 Using index; Using temporary; Using filesort
1 SIMPLE c NULL ALL NULL NULL NULL NULL 147790 10.00 Using where; Not exists; Using join buffer (Block Nested Loop)
In MySql 5.6, unfortunately I cannot get the explanation for query 1 right now (taking too much time), but for query 4 is the below:
id select_type table type possible_keys key key_len ref rows Extra
---------------------------------------------------------------------------------------------------------------------------
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 147626 Using temporary; Using filesort
1 PRIMARY <derived3> ref <auto_key0> <auto_key0> 4 p.id 10 Using where; Not exists
3 DERIVED comments ALL NULL NULL NULL NULL 147790 NULL
2 DERIVED pages index NULL PRIMARY 4 NULL 147626 Using index
Tables:
CREATE TABLE `pages` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`identifier` varchar(250) NOT NULL DEFAULT '',
`reference` varchar(250) NOT NULL DEFAULT '',
`url` varchar(1000) NOT NULL DEFAULT '',
`moderate` varchar(250) NOT NULL DEFAULT 'default',
`is_form_enabled` tinyint(1) unsigned NOT NULL DEFAULT '1',
`date_modified` datetime NOT NULL,
`date_added` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=147627 DEFAULT CHARSET=utf8
CREATE TABLE `comments` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(10) unsigned NOT NULL DEFAULT '0',
`page_id` int(10) unsigned NOT NULL DEFAULT '0',
`website` varchar(250) NOT NULL DEFAULT '',
`town` varchar(250) NOT NULL DEFAULT '',
`state_id` int(10) NOT NULL DEFAULT '0',
`country_id` int(10) NOT NULL DEFAULT '0',
`rating` tinyint(1) unsigned NOT NULL DEFAULT '0',
`reply_to` int(10) unsigned NOT NULL DEFAULT '0',
`comment` text NOT NULL,
`reply` text NOT NULL,
`ip_address` varchar(250) NOT NULL DEFAULT '',
`is_approved` tinyint(1) unsigned NOT NULL DEFAULT '1',
`notes` text NOT NULL,
`is_admin` tinyint(1) unsigned NOT NULL DEFAULT '0',
`is_sent` tinyint(1) unsigned NOT NULL DEFAULT '0',
`sent_to` int(10) unsigned NOT NULL DEFAULT '0',
`likes` int(10) unsigned NOT NULL DEFAULT '0',
`dislikes` int(10) unsigned NOT NULL DEFAULT '0',
`reports` int(10) unsigned NOT NULL DEFAULT '0',
`is_sticky` tinyint(1) unsigned NOT NULL DEFAULT '0',
`is_locked` tinyint(1) unsigned NOT NULL DEFAULT '0',
`is_verified` tinyint(1) unsigned NOT NULL DEFAULT '0',
`date_modified` datetime NOT NULL,
`date_added` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=147879 DEFAULT CHARSET=utf8
Questions
Why is this happening? What does MySql do under the hood?
Does this happen only in MySql, or any other Sql as well?
How can I write a fast query to get what I need? (In both v 5.6, 5.7)
The problem with your long-running queries, is that you lack an index on the page_id column of the comments table. Hence, for each row from the pages table, you need to check all rows of the comments table. Since you are using LEFT JOIN, this is the only possible join order. What happens in 5.6, is that when you use a subquery in the FROM clause (aka derived table), MySQL will create an index on the temporary table used for the result of the derived table (auto_key0 in the EXPLAIN output). The reason it is faster when you only select one column, is that the temporary table will be smaller.
In MySQL 5.7, such derived tables will be automatically merge into the main query, if possible. This is done to avoid the extra temporary tables. However, this means that you no longer have an index to use for the join. (See this blog post for details.)
You have two options to improve the query time in 5.7:
You can create an index on comments(page_id)
You can prevent the subquery from being merged by rewriting it to a query that can not be merged. Subqueries with aggregation, LIMIT, or UNION will not be merged (see the blog post for details). One way to do this is to add a LIMIT clause to the subquery. In order not to remove any rows from the result, the limit must be larger than the number of rows in the table.
In MySQL 8.0, you can also use an optimizer hint to avoid the merging. In your case, that would be something like
SELECT /*+ NO_MERGE(c) */ ... FROM
See slides 34-37 of this presentation for examples of how to use such hints.
Query 1 has the "explode-implode" syndrome. First it does a JOIN; this explodes the number of rows. Then it does a GROUP BY to shrink back.
Also
The number of comments per page, etc, will have an effect on your query.
SELECT * fetches all the columns, when it needs only to know if the LEFT JOIN succeeded. (You observed that.) Furthermore, you keep none of the columns since you are looking for missing rows.
Query 2 should not be as fast as what you found -- It needs to build two temp tables (the "derived" tables), index one of them, then perform the outer query. (Possibly a new enough version of MySQL can short-circuit some of that effort; old version were notorious at doing an inefficient job.)
Query 3:
Try
SELECT p.id
FROM `pages` AS p
WHERE NOT EXISTS( SELECT 1 FROM `comments`
WHERE page_id = p.id );
ALSO:
Use InnoDB, not MyISAM.
comments needs INDEX(page_id)

Strange index behavior mysql

I usually pride myself to be a database pro but I can't really wrap my head around this behavior. I hope someone can explain how this is working.
I have two mysql tables orders:
CREATE TABLE `orders` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`status` tinyint(4) NOT NULL,
`total` decimal(7,2) NOT NULL,
`date_created` datetime NOT NULL,
`date_updated` datetime NOT NULL,
`voucher_code` varchar(127) DEFAULT NULL,
`voucher_id` int(11) unsigned DEFAULT NULL,
`user_id` int(11) unsigned DEFAULT NULL,
`billing_address_id` int(11) unsigned NOT NULL,
`shipping_address_id` int(11) unsigned NOT NULL,
`reference_id` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `reference_id` (`reference_id`),
KEY `address_id` (`billing_address_id`)
) ENGINE=InnoDB AUTO_INCREMENT=168067 DEFAULT CHARSET=latin1;
and addresses:
CREATE TABLE `addresses` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`title` tinyint(4) DEFAULT NULL,
`first_name` varchar(255) NOT NULL,
`last_name` varchar(255) NOT NULL,
`street` varchar(255) NOT NULL,
`street2` varchar(255) DEFAULT NULL,
`company_name` varchar(255) DEFAULT NULL,
`city` varchar(45) NOT NULL,
`postcode` varchar(45) DEFAULT NULL,
`region` varchar(45) DEFAULT NULL,
`country` varchar(45) NOT NULL,
`phone` varchar(45) DEFAULT NULL,
`user_id` int(11) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `fk_addresses_users1_idx` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=95277 DEFAULT CHARSET=latin1;
Now as you can see I have created an index inside the orders table for the billing_address_id called address_id that should match with the address id.
This is the query I am trying to run:
SELECT
o.id, a.first_name, a.last_name, o.total, o.date_created
FROM
orders o USE INDEX FOR JOIN (PRIMARY) JOIN
addresses a ON a.id = o.billing_address_id
ORDER BY id DESC
LIMIT 0, 50
If I run the query without any index specification it will pickup and use the address_id index which I would expect be the fastest way to match the two tables.
Strangely enough with the 'address_id' index the query runs in 2 seconds.
If i use the normal 'PRIMARY' index which works on the order id it takes 0.000 seconds.
This is bugging me out. I thought I was supposed to create indexes to expedite the joining process between tables.
If I run EXPLAIN on the two queries I get:
EXPLAIN EXTENDED
SELECT o.id, a.first_name, a.last_name, o.total, o.date_created
FROM orders o
JOIN addresses a ON a.id = o.billing_address_id
ORDER BY id DESC
LIMIT 0, 50
1 SIMPLE a ALL PRIMARY 95234 100.00 Using temporary; Using filesort
1 SIMPLE o ref address_id address_id 4 my_basket.a.id 1 100.00
With the index:
EXPLAIN EXTENDED
SELECT o.id, a.first_name, a.last_name, o.total, o.date_created
FROM orders o USE INDEX FOR
JOIN (PRIMARY)
JOIN addresses a ON a.id = o.billing_address_id
ORDER BY id DESC
LIMIT 0, 50
1 SIMPLE o index PRIMARY 4 50 332632.00
1 SIMPLE a eq_ref PRIMARY PRIMARY 4 my_basket.o.billing_address_id 1 100.00
Thank you for finding the time to answer this question.
For ORDER BY ... LIMIT queries it will often be beneficial to use a query execution plan that avoids sorting. This is not necessarily because the sorting is expensive, but because it makes it possible to stop the query execution once the number of requested rows (here 50) are found.
In your case, if one starts with table a, the full join result will have to be generated before selecting the "top" 50 rows. If you start with scanning table o using the PRIMARY index, the join result will be sorted on o.id, and the join execution can stop once 50 rows have been found.
The cost model used to select between the two approaches has been improved since MySQL 5.6. I suggest you try out MySQL 5.7 to see if the MySQL optimizer is now able to select the most optimal plan.
I'm surprised that the two queries even compile -- ORDER BY id is ambiguous since each table has a different id.
When doing a JOIN, always qualify all columns.
Meanwhile, remove the USE INDEX.

MySQL Query Using LEFT JOIN Hangs

Can somebody please advise where I could be going wrong with the following query which is causing my page to hang:
SELECT m.member_id, m.member_name, SUM(CONCAT(f.pounds, '.', f.pence)) AS totalfigure
FROM members m
LEFT JOIN figures f ON f.member_id = m.member_id AND f.period_id = 45
WHERE m.added BETWEEN '1980-01-01' AND '2014-10-31'
AND (DATE_SUB('2014-08-01', INTERVAL 60 DAY)) < m.removed
GROUP BY m.member_id
ORDER BY m.member_name;
The same query runs fine if I use JOIN rather than LEFT JOIN but I need to also view members that do not have an entry in the figures table.
This is the EXPLAIN result I get with the LEFT JOIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE m ALL NULL NULL NULL NULL 3128 Using temporary; Using filesort
1 SIMPLE f ALL NULL NULL NULL NULL 169214
and with the JOIN the result is:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE f ALL NULL NULL NULL NULL 169214 Using where; Using temporary; Using filesort
1 SIMPLE m eq_ref PRIMARY PRIMARY 3 db.f.member_id 1
The structure for each of these tables is:
CREATE TABLE IF NOT EXISTS `members` (
`member_id` mediumint(5) unsigned NOT NULL AUTO_INCREMENT,
`member_name` varchar(100) NOT NULL DEFAULT '',
PRIMARY KEY (`member_id`),
FULLTEXT KEY `member_name` (`member_name`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=3462 ;
CREATE TABLE IF NOT EXISTS `figures` (
`supplier_id` int(10) unsigned NOT NULL DEFAULT '0',
`contract_id` mediumint(5) unsigned NOT NULL DEFAULT '1',
`member_id` mediumint(5) unsigned NOT NULL,
`period_id` mediumint(5) unsigned NOT NULL DEFAULT '0',
`pounds` varchar(8) DEFAULT NULL,
`pence` char(3) DEFAULT '00',
PRIMARY KEY (`supplier_id`,`member_id`,`period_id`,`contract_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Thanks

MySQL Indexes for extremely slow queries

The following query, regardless of environment, takes more than 30 seconds to compute.
SELECT COUNT( r.response_answer )
FROM response r
INNER JOIN (
SELECT G.question_id
FROM question G
INNER JOIN answer_group AG ON G.answer_group_id = AG.answer_group_id
WHERE AG.answer_group_stat = 'statistic'
) AS q ON r.question_id = q.question_id
INNER JOIN org_survey os ON os.org_survey_code = r.org_survey_code
WHERE os.survey_id =42
AND r.response_answer = 5
AND DATEDIFF( NOW( ) , r.added_dt ) <1000000
AND r.uuid IS NOT NULL
When I explain the query,
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 1087
1 PRIMARY r ref question_id,org_survey_code,code_question,uuid,uor question_id 4 q.question_id 1545 Using where
1 PRIMARY os eq_ref org_survey_code,survey_id,org_survey_code_2 org_survey_code 12 survey_2.r.org_survey_code 1 Using where
2 DERIVED G ALL agid NULL NULL NULL 1680
2 DERIVED AG eq_ref PRIMARY PRIMARY 1 survey_2.G.answer_group_id 1 Using where
I have a very basic knowledge of indexing, but I have tried nearly every combination I can think of and cannot seem to improve the speed of this query. The responses table is right around 2 million rows, question is about 1500 rows, answer_group is about 50, and org_survey is about 8,000.
Here is the basic structure for each:
CREATE TABLE `response` (
`response_id` int(10) unsigned NOT NULL auto_increment,
`response_answer` text NOT NULL,
`question_id` int(10) unsigned NOT NULL default '0',
`org_survey_code` varchar(7) NOT NULL,
`uuid` varchar(40) default NULL,
`added_dt` datetime default NULL,
PRIMARY KEY (`response_id`),
KEY `question_id` (`question_id`),
KEY `org_survey_code` (`org_survey_code`),
KEY `code_question` (`org_survey_code`,`question_id`),
KEY `IDX_ADDED_DT` (`added_dt`),
KEY `uuid` (`uuid`),
KEY `response_answer` (`response_answer`(1)),
KEY `response_question` (`response_answer`(1),`question_id`),
) ENGINE=MyISAM AUTO_INCREMENT=2298109 DEFAULT CHARSET=latin1
CREATE TABLE `question` (
`question_id` int(10) unsigned NOT NULL auto_increment,
`question_text` varchar(250) NOT NULL default '',
`question_group` varchar(250) default NULL,
`question_position` tinyint(3) unsigned NOT NULL default '0',
`survey_id` tinyint(3) unsigned NOT NULL default '0',
`answer_group_id` mediumint(8) unsigned NOT NULL default '0',
`seq_id` int(11) NOT NULL default '0',
PRIMARY KEY (`question_id`),
KEY `question_group` (`question_group`(10)),
KEY `survey_id` (`survey_id`),
KEY `agid` (`answer_group_id`)
) ENGINE=MyISAM AUTO_INCREMENT=1860 DEFAULT CHARSET=latin1
CREATE TABLE `org_survey` (
`org_survey_id` int(11) NOT NULL auto_increment,
`org_survey_code` varchar(10) NOT NULL default '',
`org_id` int(11) NOT NULL default '0',
`org_manager_id` int(11) NOT NULL default '0',
`org_url_id` int(11) default '0',
`division_id` int(11) default '0',
`sector_id` int(11) default NULL,
`survey_id` int(11) NOT NULL default '0',
`process_batch` tinyint(4) default '0',
`added_dt` datetime default NULL,
PRIMARY KEY (`org_survey_id`),
UNIQUE KEY `org_survey_code` (`org_survey_code`),
KEY `org_id` (`org_id`),
KEY `survey_id` (`survey_id`),
KEY `org_survey_code_2` (`org_survey_code`,`total_taken`),
KEY `org_manager_id` (`org_manager_id`),
KEY `sector_id` (`sector_id`)
) ENGINE=MyISAM AUTO_INCREMENT=9268 DEFAULT CHARSET=latin1
CREATE TABLE `answer_group` (
`answer_group_id` tinyint(3) unsigned NOT NULL auto_increment,
`answer_group_name` varchar(50) NOT NULL default '',
`answer_group_type` varchar(20) NOT NULL default '',
`answer_group_stat` varchar(20) NOT NULL default 'demographic',
PRIMARY KEY (`answer_group_id`)
) ENGINE=MyISAM AUTO_INCREMENT=53 DEFAULT CHARSET=latin1
I know there are small things I can probably do to improve the efficiency of the database, such as reducing the size of integers where it's unnecessary. However, those are fairly trivial considering the ridiculous time it takes just to produce a result here. How can I properly index these tables, based on what explain has shown me? It seems that I have tried a large variety of combinations to no avail. Also, is there anything else that anyone can see that will optimize the table and reduce the query? I need it to be computed in less than a second. Thanks in advance!
1.If you want the index of r.added_dt to be used, instead of:
DATEDIFF(NOW(), r.added_dt) < 1000000
use:
CURDATE() - INTERVAL 1000000 DAY < r.added_dt
Anyway, the above condition is checking if added_at is a million days old or not. Do you really store so old dates? If not, you can simply remove this condition.
If you want this condition, an index on added_at would help a lot. Your query as it is now, checks all rows for this condition, calling the DATEDIFF() function as many times as the rows of the response table.
2.Since r.response_answer cannot be NULL, instead of:
SELECT COUNT( r.response_answer )
use:
SELECT COUNT( * )
COUNT(*) is faster than COUNT(field).
3.Two of the three fields that you use for joining tables have different datatypes:
ON question . answer_group_id
= answer_group . answer_group_id
CREATE TABLE question (
...
answer_group_id mediumint(8) ..., <--- mediumint
CREATE TABLE answer_group (
answer_group_id` tinyint(3) ..., <--- tinyint
-------------------------------
ON org_survey . org_survey_code
= response . org_survey_code
CREATE TABLE response (
...
org_survey_code varchar(7) NOT NULL, <--- 7
CREATE TABLE org_survey (
...
org_survey_code varchar(10) NOT NULL default '', <--- 10
Datatype mediumint is not the same as tinyint and the same goes for varchar(7) and varchar(10). When they are used for join, MySQL has to lose time doing conversion from one type to another. Convert one of them so they have identical datatypes. This is not the main issue of the query but this change will also help all other queries that use these joins.
And after making this change do a 'Analyze Table ' for the table. It will help mysql making better execution plans.
You have a response_answer = 5 condition, where response_answer is text. It's not an error, but it's better to use response_answer = '5' (the conversion of 5 to '5' will be done by MySQL anyway, if you don't do that).
Real issue is that you don't have a compound index on the 3 fields that are used in the WHERE conditions. Try adding this one:
ALTER TABLE response
ADD INDEX ind_u1_ra1_aa
(uuid(1), response_answer(1), added_at) ;
(this may take a while as your table is not small)
Can you try the following query? I've removed the sub-query from your original one. This may let the optimiser produce a better execution plan.
SELECT COUNT(r.response_answer)
FROM response r
INNER JOIN question q ON r.question_id = q.question_id
INNER JOIN answer_group ag ON q.answer_group_id = ag.answer_group_id
INNER JOIN org_survey os ON os.org_survey_code = r.org_survey_code
WHERE
ag.answer_group_stat = 'statistic'
AND os.survey_id = 42
AND r.response_answer = 5
AND DATEDIFF(NOW(), r.added_dt) < 1000000
AND r.uuid IS NOT NULL

Trying to optimize MySQL query with LEFT OUTER JOIN

I've this query, which works fine except it takes a long while (7 seconds, with 40k records in the jobs table, and 700k in the wq table).
I tried an EXPLAIN and it says its looking at all the records in the job table, and not using any of the indexes.
I don't know how to tell MySQL that it should use the jobs.status field to filter the the records before looking up the wq table.
The objective of this, is to get all the records from jobs that have a status != 331, and also any other job which has a wq status of (101, 111, 151).
Query:
SELECT jobs.*
FROM jobs
LEFT OUTER JOIN wq ON (wq.job = jobs.id AND jobs.status IN (341, 331) AND wq.status IN (101, 111, 151))
WHERE ((wq.info is not NULL) or (jobs.status != 331 and ack = 0))
EXPLAIN output:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE jobs ALL ack,status,status_ack NULL NULL NULL 38111 Using filesort
1 SIMPLE wq ref PRIMARY,job,status PRIMARY 4 cts.jobs.id 20 Using where
Table definitions:
CREATE TABLE jobs ( id int(10) NOT NULL AUTO_INCREMENT,
comment varchar(100) NOT NULL DEFAULT '',
profile varchar(60) NOT NULL DEFAULT '',
start_at int(10) NOT NULL DEFAULT '0',
data text NOT NULL,
status int(10) NOT NULL DEFAULT '0',
info varchar(200) NOT NULL DEFAULT '',
finish int(10) NOT NULL DEFAULT '0',
priority int(5) NOT NULL DEFAULT '0',
ack tinyint(4) NOT NULL DEFAULT '0',
PRIMARY KEY (id),
KEY start_at (start_at),
KEY status (status),
KEY status_ack (status,
ack) ) ENGINE=MyISAM AUTO_INCREMENT=2037530 DEFAULT CHARSET=latin1;
CREATE TABLE wq ( job int(10) NOT NULL DEFAULT '0',
process varchar(60) NOT NULL DEFAULT '',
step varchar(60) NOT NULL DEFAULT '',
status int(10) NOT NULL DEFAULT '0',
run_at int(10) NOT NULL DEFAULT '0',
original_run_at int(10) NOT NULL DEFAULT '0',
info varchar(200) NOT NULL DEFAULT '',
pos int(10) NOT NULL DEFAULT '0',
changed_at int(10) NOT NULL DEFAULT '0',
file varchar(60) NOT NULL DEFAULT '',
PRIMARY KEY (job,
process,
step,
file),
KEY job (job),
KEY status (status) ) ENGINE=MyISAM DEFAULT CHARSET=latin1
Unfortunately mysql (and perhaps any dbms) cannot optimize expressions like jobs.status != 331 and ack = 0 because B-Tree is not a structure that allows to find fast anything that is-not-equal-to-a-constant-value. Thus you'll always get a fullscan.
If there were some better condition like jobs.status = 331 and ack = 0 (note on the fact that i've changed != to =) then it would be an advice to speed up this query:
split the query into 2, joined by UNION ALL
replace in one query LEFT JOIN to INNER JOIN (in the one that implies that wq.info is not NULL)