mysql left join binary group by - mysql

I have three tables, tag is a column present in two of them (in the second table the same tag is present on multiple rows), they are both set to Varchar(70) and indexed as Full Text. the_id is a column in the first table (it's not the primary id but is indexed as Index).
So my question is, why would this query take between 4 to 8 seconds to respond:
SELECT * FROM table_1 LEFT JOIN table_3 ON (table_1.the_id = table_3.id) LEFT JOIN table_2 ON table_1.tag = table_2.tag WHERE table_2.bad = 0 GROUP BY table_1.the_id
(I use this query to make sure that there is at least 1 row that is not set as bad in the second table and I group by the_id to not show duplicate entries with similar tags but the same the_id, which I use to join the third table by it's primary id)
Explained Query:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 | simple | table_1 | ALL | tag | NULL | NULL | NULL | 2809 | Using temporary; Using filesort
1 | simple | table_2 | ALL | bad,tag | NULL | NULL | NULL | 3689 | Using where
1 | simple | table_3 | eq_ref | PRIMARY |PRIMARY| 8 |the_id | 1 |

Related

MySQL Subquery making query super slow

I've been struggling when it comes to optimizing the following query (Example 1):
SELECT `service`.*
FROM
(
SELECT `storeUser`.`storeId`
FROM `storeUser`
WHERE `storeUser`.`userId` = 1
UNION
SELECT `store`.`storeId`
FROM `companyUser`
INNER JOIN `store` ON `companyUser`.`companyId` = `store`.`companyId`
WHERE `companyUser`.`userId` = 1
UNION
SELECT `store`.`storeId`
FROM `accountUser`
INNER JOIN `company` ON `company`.`accountId` = `accountUser`.`accountId`
INNER JOIN `store` ON `company`.`companyId` = `store`.`companyId`
WHERE `accountUser`.`userId` = 1
) AS `storeUser`
INNER JOIN `service` ON `storeUser`.`storeId` = `service`.`storeId`
LIMIT 10;
The subquery should be returning something like "1","2","3,"4"
Anyway it's super slow and takes about 48 seconds to give a response, even though the subquery by itself, ran in a different console, takes about 0,0020ms to give results.
The same applies if I place the subquery inside an IN instead (Example 2):
SELECT `service`.*
FROM `service`
WHERE 1
AND `service`.`storeId` IN (
SELECT `storeUser`.`storeId` FROM `storeUser` WHERE `storeUser`.`userId` = 1
UNION
SELECT `store`.`storeId` FROM `companyUser`
INNER JOIN `store` ON `companyUser`.`companyId` = `store`.`companyId`
WHERE `companyUser`.`userId` = 1
UNION
SELECT `store`.`storeId`
FROM `accountUser`
INNER JOIN `company` ON `company`.`accountId` = `accountUser`.`accountId`
INNER JOIN `store` ON `company`.`companyId` = `store`.`companyId`
WHERE `accountUser`.`userId` = 1
)
LIMIT 10;
However if I simply put the values returned by that query, manually, it's basically instantly:
SELECT
`service`.*
FROM
`service`
WHERE 1
AND `service`.`storeId` IN (
"1", "2", "3", "4", "5"
)
LIMIT 10;
Important to mention that'd I've reviewed the indexes in the joins and everything seems to be in place, and the EXPLAIN [query] returns a filtered score of 100 for basically everything.
Edit:
Sorry for not providing enough information before, hope this can be more helpful:
MySQL 5.7,
Storage engine: InnoDB
EXPLAINs
1.) StoreUser
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
1 | SIMPLE | storeUser | NULL | ref | PRIMARY, storeUserUser | PRIMARY | 4 | const | 1 |100.00 | Using index
2.) CompanyUser
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
1 | SIMPLE | companyUser | NULL | ref | PRIMARY,companyUserCompany,companyUserUser | companyUserUser | 4 | const | 30 | 100.00 | Using index
1 | SIMPLE | store | NULL | ref | storeCompany | storeCompany | 4 | Table.companyUser.companyId | 5 | 100.00 | Using index
3.) AccountUser
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
1 | SIMPLE | accountUser | NULL | ref | PRIMARY,accountUserUser | accountUserUser | 4 | const | 1 | 100.00 | Using index
1 | SIMPLE | company | NULL | ref | PRIMARY,companyAccount | companyAccount | 4 | Table.accountUser.accountId | 305 | 100.00 | Using index
1 | SIMPLE | store | NULL | ref | storeCompany | storeCompany | 4 | Table.company.companyId | 5 | 100.00 | Using index
4.) Whole query (Example 2)
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
1 | PRIMARY | service | NULL | ALL | NULL | NULL | NULL | NULL | 2836046 | 100.00 | Using where
2 | DEPENDENT SUBQUERY | storeUser | NULL | eq_ref | PRIMARY,storeUserStore,storeUserUser | PRIMARY | 8 | const,func | 1 | 100.00 | Using index
3 | DEPENDENT UNION | store | NULL | eq_ref | PRIMARY,storeCompany | PRIMARY | 4 | func | 1 | 100.00 | NULL
3 | DEPENDENT UNION | companyUser | NULL | eq_ref | PRIMARY,companyUserCompany,companyUserUser | PRIMARY | 8 | const,Table.store.companyId | 1 | 100.00 | Using index
4 | DEPENDENT UNION | companyUser | NULL | ref | PRIMARY,accountUserUser | accountUserUser | 4 | const | 1 | 100.00 | Using index
4 | DEPENDENT UNION | store | NULL | eq_ref | PRIMARY,storeCompany | PRIMARY | 4 | func | 1 | 100.00 | NULL
4 | DEPENDENT UNION | company | NULL | eq_ref | PRIMARY,companyAccount | PRIMARY | 4 | Table.store.companyId | 1 | 100.00 | Using where
NULL | UNION RESULT | <union2,3,4>| NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary
You didn't show us your indexes or EXPLAIN output, so all this is guesswork.
Clearly it's the subquery in your second example that's not optimized. That subquery is a UNION with three branches. The way you address performance trouble? Analyze and optimize each branch of the UNION separately.
You certainly need some better indexes, unless your database server is too small or misconfigured. That's very rare, so let's work on indexes.
The first branch is
SELECT storeUser.storeId
FROM storeUser
WHERE storeUser.userId = 1
This compound index covers that query. Try adding it. If you have a separate index on just userId, drop it when you add this one.
ALTER TABLE storeUser ADD INDEX userId_storeId (userId, storeId);
The second branch is
SELECT store.storeId
FROM companyUser
INNER JOIN store ON companyUser.companyId = store.companyId
WHERE companyUser.userId = 1
Subqueries with JOIN operations are a little tricker to optimize without access to EXPLAIN output, so this is guesswork. I guess these indexes will help, though. (Assuming you use InnoDB and the PK on store is storeId.)
ALTER TABLE companyUser ADD INDEX userId_companyId (userId, companyId);
ALTER TABLE store ADD INDEX companyId (companyId);
Similar analysis applies to the third branch of the UNION.
And, add this index. Your EXPLAIN points to it being missing, and so a full table scan of that large table being required.
ALTER TABLE service ADD INDEX storeId (storeId);
Again, helping you would be far easier if you showed us your table definitions with indexes. SHOW CREATE TABLE service; for example, would show us what we need for your service table. Pro tip when troubleshooting this kind of performance stuff always doublecheck your indexes. Ask me how I know that when you have a couple of hours to spare.
Pro tip Be obsessive about formatting your queries so they're readable. You, yourself a year from now, and your co-workers yet unborn need to read and reason about them. To my way of thinking that means skipping those silly backticks.
Perhaps you need to rethink the schema. It seems like you need a table for "user" instead of, or in addition to, the 3 tables for different types of "users".
Meanwhile, these composite indexes are likely to help performance in either formulation:
storeUser: INDEX(storeId, userId)
storeUser: INDEX(userId, storeId)
service: INDEX(storeId)
store: INDEX(companyId, storeId)
companyUser: INDEX(userId, companyId)
company: INDEX(accountId, companyId)
accountUser: INDEX(userId, accounted)
When adding a composite index, DROP index(es) with the same leading columns.
That is, when you have both INDEX(a) and INDEX(a,b), toss the former.
In particular, storeUser smells like a many-to-many mapping table. If so, see Many:many mapping for more discussion.
In general IN( SELECT ... ) does not optimize well, but you might find otherwise for your query.
Sorry to not give more details about the schemas but I wasn't allowed to share it here, anyway, the problem happened to be elsewhere:
The service table was receiving a huge amount of requests, some actions were even locking it up, ending up on slow times whenever we were accesing that table, we have fixed our other proccess and it's working great now. Hugely appreciate your time and effort, thanks.

Cannot make mysql use indexes built for a table via a join query

I have the following complex query that is taking a while to run:
SELECT
`User`.`id`,
`User`.`username`,
`User`.`password`,
`User`.`role`,
`User`.`created`,
`User`.`modified`,
`User`.`email`,
`User`.`other_user_id`,
`User`.`first_name`,
`User`.`last_name`,
`User`.`school_id`,
`Resume`.`id`,
`Resume`.`user_id`,
`Resume`.`other_resume_id`,
`Resume`.`other_user_id`,
`Resume`.`file_extension`,
`Resume`.`created`,
`Resume`.`modified`,
`Resume`.`is_deleted`,
`Resume`.`has_file`,
`Resume`.`is_stamped`,
`Resume`.`is_active`
FROM
`dataplace`.`users` AS `User`
LEFT JOIN `dataplace`.`attempts` AS `Attempt`
ON (`Attempt`.`user_id` = `User`.`id` AND `Attempt`.`test_id` != 5)
LEFT JOIN `dataplace`.`resumes` AS `Resume`
ON (`Resume`.`user_id` = `User`.`id`)
WHERE
`Resume`.`has_file` = 1
GROUP BY `User`.`id`
ORDER BY `Attempt`.`score` DESC;
This query generates the following explain:
+----+-------------+---------+--------+---------------+---------------+---------+------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+---------------+---------------+---------+------------------------------+-------+----------------------------------------------+
| 1 | SIMPLE | Resume | ALL | user_id_index | NULL | NULL | NULL | 18818 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | User | eq_ref | PRIMARY | PRIMARY | 4 | dataplace.Resume.user_id | 1 | Using where |
| 1 | SIMPLE | Attempt | ref | user_id_index | user_id_index | 5 | dataplace.User.id | 1 | |
+----+-------------+---------+--------+---------------+---------------+---------+------------------------------+-------+----------------------------------------------+
The resume table has 4 separate indexes that are as followed:
PRIMARY id
user_id_index
other_resume_id_index
other_user_id_index
Based on this, I would expect the user_id index from the resumes table to be used with the query in question, but it is not. Could this be an issue with ordering? Is there some other reason why this index is not in use? Would a different index serve me better. I am not sure? Thank you to anyone that can help.
You've basically got no useful WHERE clause, because the condition you have there applies to the last table joined and could be moved into the join condition of the last join.
The Users table is the first table accessed (it is named first in the FROM clause) and there are no conditions for users, so all rows must be accessed - no index could help there.

Query takes too much time with JOIN

I need a little help improving the following query performance
SELECT *
FROM dle_pause
LEFT JOIN dle_post_plus
ON ( dle_pause.pause_postid = dle_post_plus.puuid )
LEFT JOIN dle_post
ON ( dle_post_plus.news_id = dle_post.id )
LEFT JOIN dle_playerfiles
ON ( dle_post.id = dle_playerfiles.post_id )
WHERE pause_user = '2';
it takes 3 rows in set (0.35 sec) the problem is with the third join. one of the rows don't have dle_post.id = dle_playerfiles.post_id so it scans whole the table.
looks like I have all needed indexes
+----+-------------+-----------------+--------+----------------------------------+---------+---------+-----------------------------------+--------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+--------+----------------------------------+---------+---------+-----------------------------------+--------+------------------------------------------------+
| 1 | SIMPLE | dle_pause | ALL | pause_user | NULL | NULL | NULL | 3 | Using where |
| 1 | SIMPLE | dle_post_plus | ref | puuid | puuid | 36 | func | 1 | Using where |
| 1 | SIMPLE | dle_post | eq_ref | PRIMARY | PRIMARY | 4 | online_test.dle_post_plus.news_id | 1 | NULL |
| 1 | SIMPLE | dle_playerFiles | ALL | ix_dle_playerFiles__post_id_type | NULL | NULL | NULL | 131454 | Range checked for each record (index map: 0x2) |
+----+-------------+-----------------+--------+----------------------------------+---------+---------+-----------------------------------+--------+------------------------------------------------+
If you have not put index on dle_playerfiles' post_id, then put index on it.
If you have already put an index on it, then in your query at last join write 'use index' like this:
SELECT *
FROM
dle_pause
LEFT JOIN dle_post_plus
ON ( dle_pause.pause_postid = dle_post_plus.puuid )
LEFT JOIN dle_post
ON ( dle_post_plus.news_id = dle_post.id )
LEFT JOIN dle_playerfiles **use index(post_id)**
ON ( dle_post.id = dle_playerfiles.post_id )
WHERE
pause_user = '2';
This will use index for fourth table also. Right now your explain show that it is not using any index on fourth table and hence scans 131454 rows.
I can suggest two alternative for solving this.
First alternative:
Create a temporary tables that contain only non NULL values for the key you comparing with LEFT join.
Something like this:
select *
into #dle_post_plus
where pause_postid is not null
Do it for all three tables.
Then use your original query on the temporary tables which does not include NULL values.
Second alternative:
Create an index for each key you are comparing in the left join, in this way the index will do the job for you.
Off course you can always combine the two methods I suggested.

Optimizing a large keyword table?

I have a huge table like
CREATE TABLE IF NOT EXISTS `object_search` (
`keyword` varchar(40) COLLATE latin1_german1_ci NOT NULL,
`object_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`keyword`,`media_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci;
with around 39 million rows (using over 1 GB space) containing the indexed data for 1 million records in the object table (where object_id points at).
Now searching through this with a query like
SELECT object_id, COUNT(object_id) AS hits
FROM object_search
WHERE keyword = 'woman' OR keyword = 'house'
GROUP BY object_id
HAVING hits = 2
is already significantly faster than doing a LIKE search on the composed keywords field in the object table but still takes up to 1 minute.
It's explain looks like:
+----+-------------+--------+------+---------------+---------+---------+-------+--------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------+---------------+---------+---------+-------+--------+----------+--------------------------+
| 1 | SIMPLE | search | ref | PRIMARY | PRIMARY | 42 | const | 345180 | 100.00 | Using where; Using index |
+----+-------------+--------+------+---------------+---------+---------+-------+--------+----------+--------------------------+
The full explain with joined object and object_color and object_locale table, while the above query is run in a subquery to avoid overhead, looks like:
+----+-------------+-------------------+--------+---------------+-----------+---------+------------------+--------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------------+--------+---------------+-----------+---------+------------------+--------+----------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 182544 | 100.00 | Using temporary; Using filesort |
| 1 | PRIMARY | object_color | eq_ref | object_id | object_id | 4 | search.object_id | 1 | 100.00 | |
| 1 | PRIMARY | locale | eq_ref | object_id | object_id | 4 | search.object_id | 1 | 100.00 | |
| 1 | PRIMARY | object | eq_ref | PRIMARY | PRIMARY | 4 | search.object_id | 1 | 100.00 | |
| 2 | DERIVED | search | ref | PRIMARY | PRIMARY | 42 | | 345180 | 100.00 | Using where; Using index |
+----+-------------+-------------------+--------+---------------+-----------+---------+------------------+--------+----------+---------------------------------+
My top goal would be to be able to scan through this within 1 or 2 seconds.
So, are there further techniques to improve search speed for keywords?
Update 2013-08-06:
Applying most of Neville K's suggestion I now have the following setup:
CREATE TABLE `object_search_keyword` (
`keyword_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`keyword` varchar(64) COLLATE latin1_german1_ci NOT NULL,
PRIMARY KEY (`keyword_id`),
FULLTEXT KEY `keyword_ft` (`keyword`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci;
CREATE TABLE `object_search` (
`keyword_id` int(10) unsigned NOT NULL,
`object_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`keyword_id`,`media_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The new query's explain looks like this:
+----+-------------+----------------+----------+--------------------+------------+---------+---------------------------+---------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------------+----------+--------------------+------------+---------+---------------------------+---------+----------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 24381 | 100.00 | Using temporary; Using filesort |
| 1 | PRIMARY | object_color | eq_ref | object_id | object_id | 4 | object_search.object_id | 1 | 100.00 | |
| 1 | PRIMARY | object | eq_ref | PRIMARY | PRIMARY | 4 | object_search.object_id | 1 | 100.00 | |
| 1 | PRIMARY | locale | eq_ref | object_id | object_id | 4 | object_search.object_id | 1 | 100.00 | |
| 2 | DERIVED | <derived4> | system | NULL | NULL | NULL | NULL | 1 | 100.00 | |
| 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 24381 | 100.00 | |
| 4 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
| 3 | DERIVED | object_keyword | fulltext | PRIMARY,keyword_ft | keyword_ft | 0 | | 1 | 100.00 | Using where; Using temporary; Using filesort |
| 3 | DERIVED | object_search | ref | PRIMARY | PRIMARY | 4 | object_keyword.keyword_id | 2190225 | 100.00 | Using index |
+----+-------------+----------------+----------+--------------------+------------+---------+---------------------------+---------+----------+----------------------------------------------+
The many derives are coming from the keyword comparing subquery being nested into another subquery which does nothing but count the amount of rows returned:
SELECT SQL_NO_CACHE object.object_id, ..., #rn AS numrows
FROM (
SELECT *, #rn := #rn + 1
FROM (
SELECT SQL_NO_CACHE search.object_id, COUNT(turbo.object_id) AS hits
FROM object_keyword AS kwd
INNER JOIN object_search AS search ON (kwd.keyword_id = search.keyword_id)
WHERE MATCH (kwd.keyword) AGAINST ('+(woman) +(house)')
GROUP BY search.object_id HAVING hits = 2
) AS numrowswrapper
CROSS JOIN (SELECT #rn := 0) CONST
) AS turbo
INNER JOIN object AS object ON (search.object_id = object.object_id)
LEFT JOIN object_color AS object_color ON (search.object_id = object_color.object_id)
LEFT JOIN object_locale AS locale ON (search.object_id = locale.object_id)
ORDER BY timestamp_upload DESC
The above query will actually run within ~6 seconds, since it searches for two keywords. The more keywords I search for, the faster the search goes down.
Any way to further optimize this?
Update 2013-08-07
The blocking thing seems almost certainly to be the appended ORDER BY statement. Without it, the query executes in less than a second.
So, is there any way to sort the result faster? Any suggestions welcome, even hackish ones that would require post processing somewhere else.
Update 2013-08-07 later that day
Alright ladies and gentlemen, nesting the WHERE and ORDER BY statements in another layer of subquery to not let it bother with tables it doesn't need roughly doubled it's performance again:
SELECT wowrapper.*, locale.title
FROM (
SELECT SQL_NO_CACHE object.object_id, ..., #rn AS numrows
FROM (
SELECT *, #rn := #rn + 1
FROM (
SELECT SQL_NO_CACHE search.media_id, COUNT(search.media_id) AS hits
FROM object_keyword AS kwd
INNER JOIN object_search AS search ON (kwd.keyword_id = search.keyword_id)
WHERE MATCH (kwd.keyword) AGAINST ('+(frau)')
GROUP BY search.media_id HAVING hits = 1
) AS numrowswrapper
CROSS JOIN (SELECT #rn := 0) CONST
) AS search
INNER JOIN object AS object ON (search.object_id = object.object_id)
LEFT JOIN object_color AS color ON (search.object_id = color.object_id)
WHERE 1
ORDER BY object.object_id DESC
) AS wowrapper
LEFT JOIN object_locale AS locale ON (jfwrapper.object_id = locale.object_id)
LIMIT 0,48
Searches that took 12 seconds (single keyword, ~200K results) now take 6, and a search for two keywords that took 6 seconds (60K results) now takes around 3.5 secs.
Now this is already a massive improvement, but is there any chance to push this further?
Update 2013-08-08 early that day
Undid that last nested variation of the query, since it actually slowed down other variations of it...
I'm now trying some other things with different table layouts and FULLTEXT indexes using MyISAM for a dedicated search table with a combined keyword field (comma separated in a TEXT field).
Update 2013-08-08
Alright, a plain fulltext index doesnt really help.
Back to the previous setup, the only thing blocking is the ORDER BY (which resorts to using a temporary table and filesort). Without it a search is complete within less than a second!
So basically what's left of all this is:
How do I optimize the ORDER BY statement to run faster, likely by eliminating the use of the temporary table?
Full text search will be much faster than using the standard SQL string comparison features.
Secondly, if you have a high degree of redundancy in the keywords, you could consider a "many to many" implementation:
Keywords
--------
keyword_id
keyword
keyword_object
-------------
keyword_id
object_id
objects
-------
object_id
......
If this reduces the string comparison from 39 million rows to 100K rows (roughly the size of the English dictionary), you may also see a distinct improvement, as the query would only have to perform 100K string comparisons, and joining on an integer keyword_id and object_id field should be much, much faster than doing 39M string comparisons.
The best solution for this will be a FULLTEXT search, but you will probably need a MyISAM table for that. You can setup a mirror table and update it with some events and triggers or if you have a slave replicating from your server you can change its table to MyISAM and use it for searches.
For this query the only thing I can come up with is to rewrite it as:
SELECT s1.object_id
FROM object_search s1
JOIN object_search s2 ON s2.object_id = s1.object_id AND s2.key_word = 'word2'
JOIN object_search s3 ON s3.object_id = s1.object_id AND s3.key_word = 'word3'
....
WHERE s1.key_word = 'word1'
and I'm not sure it will be faster this way.
Also you will need to have an index on object_id (assuming your PK is (key_word, object_id)).
If you have seldom INSERTs and often SELECTs you could optimize your data for the reads i.e. recalculate the number of object_ids per keyword and directly store it in the database. The SELECTs would then be very fast, the INSERTs would take some seconds though,.

calculation user's age on fly, optimization. mysql

I have next (strange) query
SELECT DISTINCT c.id
FROM z1 INNER JOIN c c ON (z1.id=c.id)
INNER JOIN i ON (c.member_id=i.member_id)
WHERE DATE_FORMAT(CONCAT(i.birthyear,"-",i.birthmonth,"-",i.birthday),"%Y%m%d000000") BETWEEN '19820605000000' AND '19930604235959' AND c.id NOT IN (658887)
GROUP BY c.id
user's birthday keeps in db in three different colums. but here is the task to find out user's stuff which ages are in specific range.
The worst thing, that mysql will calculate age for each selected record and compare it with condition and it's not good :( is there any way to make it faster ?
this is the plan
+----+-------------+-------+--------+-------------------+---------+---------+--------------------+--------+----------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+--------+-------------------+---------+---------+--------------------+--------+----------+-----------------------------------------------------------+
| 1 | SIMPLE | z1 | index | PRIMARY | PRIMARY | 4 | NULL | 176659 | 100.00 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | c | eq_ref | PRIMARY,member_id | PRIMARY | 4 | z1.id | 1 | 100.00 | |
| 1 | SIMPLE | i | eq_ref | PRIMARY | PRIMARY | 4 | c.member_id | 1 | 100.00 | Using where |
+----+-------------+-------+--------+-------------------+---------+---------+--------------------+--------+----------+-----------------------------------------------------------+
As usual, the right answer is to fix your schema. i.e. data should be normalized, use native keys wherever practical and use the right data types.
Looking at your post, at least you've provided a EXPLAIN plan - but the table structures would help too.
Why is the table z1 in the query? You don't explicitly filter using it, and you don't use the result anywhere.
Why do you do bot a DISTINCT and a GROUP BY - you're asking the DBMS to do the same work twice.
Why do you use 'c' as an alias for 'c'?
Why are you using NOT IN to exclude a single value?
Why do you compare your date values as strings?
It's posible that the optimizer is getting confused about the best way to resolve the query - but you've not provided any information to support this - what proportion of the data is filterd by the age rule? You may get better results using the birthday / i table to drive the query:
SELECT DISTINCT c.id
FROM c
INNER JOIN i ON (c.member_id=i.member_id)
WHERE STR_TO_DATE(
CONCAT(i.birthyear,'-', i.birthmonth,'-',i.birthday)
,"%Y-%m-%d")
BETWEEN 19820605000000 AND 19930604235959
AND c.id <> 658887
AND i.birthyear BETWEEN 1982 AND 1993
Alter i table and add a TIMESTAMP or DATETIME column named date_of_birth with a INDEX on it :
ALTER TABLE i ADD date_of_birth DATETIME NOT NULL, ADD INDEX date_of_birth;
UPDATE i SET date_of_birth = CONCAT(i.birthyear,"-",i.birthmonth,"-",i.birthday);
And use this query which should be faster:
SELECT
c.id
FROM
i
INNER JOIN c
ON c.member_id=i.member_id
WHERE
i.date_of_bith BETWEEN '1982-06-05 00:00:00' AND '1993-06-04 23:59:59'
AND c.id NOT IN (658887)
GROUP BY
c.id
ORDER BY
NULL
You've asked me to explain what I mean. Unfortunately there are two problems with that.
The first is that I don't think that this can be adequately explained in a simple comments box.
The second is that I don't really know what I'm talking about, but I'll have a go...
Consider the following example - a simple utility table containing dates up to 2038 (when the whole UNIX_TIMESTAMP thing stops working anyway)...
CREATE TABLE calendar (
dt date NOT NULL DEFAULT '0000-00-00',
PRIMARY KEY (`dt`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Now, the following queries are logically identical...
SELECT * FROM calendar WHERE UNIX_TIMESTAMP(dt) BETWEEN 1370521405 AND 1370732400;
+------------+
| dt |
+------------+
| 2013-06-07 |
| 2013-06-08 |
| 2013-06-09 |
+------------+
SELECT * FROM calendar WHERE dt BETWEEN FROM_UNIXTIME(1370521405) AND FROM_UNIXTIME(1370732400);
+------------+
| dt |
+------------+
| 2013-06-07 |
| 2013-06-08 |
| 2013-06-09 |
+------------+
...and MySQL is clever enough to utilise the (PK) index to resolve both queries (rather than reading the table itself - yuk).
But while the first requires a full scan over the entire index (good but not great), the second is able to access the table with a key over one (or more) value ranges (terrific)...
EXPLAIN EXTENDED
SELECT * FROM calendar WHERE UNIX_TIMESTAMP(dt) BETWEEN 1370521405 AND 1370732400;
+----+-------------+----------+-------+---------------+---------+---------+------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+------+-------+--------------------------+
| 1 | SIMPLE | calendar | index | NULL | PRIMARY | 3 | NULL | 10957 | Using where; Using index |
+----+-------------+----------+-------+---------------+---------+---------+------+-------+--------------------------+
EXPLAIN EXTENDED
SELECT * FROM calendar WHERE dt BETWEEN FROM_UNIXTIME(1370521405) AND FROM_UNIXTIME(1370732400);
+----+-------------+----------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | calendar | range | PRIMARY | PRIMARY | 3 | NULL | 3 | Using where; Using index |
+----+-------------+----------+-------+---------------+---------+---------+------+------+--------------------------+