Subquery without additional column takes longer than with column - mysql

I'm trying to get a running total using a Subquery. (I'm using Metabase, which doesn't seem to accept/process variables in queries)
My Query:
SELECT date_format(t.`session_stop`, '%d') AS `session_stop`,
sum(t.`energy_used` / 1000) AS `csum`,
(
SELECT (SUM(a.`energy_used`) / 1000)
FROM `sessions` a
WHERE date_format(a.`session_stop`, '%Y-%m-%d') <= date_format(t.`session_stop`, '%Y-%m-%d')
AND str_to_date(concat(date_format(a.`session_stop`, '%Y-%m'), '-01'), '%Y-%m-%d') = str_to_date(concat(date_format(now(), '%Y-%m'), '-01'), '%Y-%m-%d')
ORDER BY str_to_date(date_format(a.`session_stop`, '%e'), '%d') ASC
) AS `sum`
FROM `sessions` t
WHERE str_to_date(concat(date_format(t.`session_stop`, '%Y-%m'), '-01'), '%Y-%m-%d') = str_to_date(concat(date_format(now(), '%Y-%m'), '-01'), '%Y-%m-%d')
GROUP BY date_format(t.`session_stop`, '%e')
ORDER BY str_to_date(date_format(t.`session_stop`, '%d'), '%d') ASC;
This takes about 1.29secs to run. (43K rows in total, returns 14)
If I remove the sum(t.`energy_used` / 1000) AS `csum`, line, the query takes up 8 mins and 40 secs.
Why is this? I'd rather not have that line, but I also can't wait 8mins for a query to process.
(I know I can create a cumulative column, but I'm especially interested why this additional sum() speeds the whole query up)
ps. tested this on both the MySQL console and the Metabase interface.
EXPLAIN query:
+----+--------------------+-------+------+---------------+------+---------+------+-------+---------------------------
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
+----+--------------------+-------+------+---------------+------+---------+------+-------+---------------------------
| 1 | PRIMARY | t | ALL | NULL | NULL | NULL | NULL | 42055 | Using where; Using tempora
| 2 | DEPENDENT SUBQUERY | a | ALL | NULL | NULL | NULL | NULL | 42055 | Using where
+----+--------------------+-------+------+---------------+------+---------+------+-------+---------------------------
2 rows in set (0.00 sec)
Without the extra sum():
+----+--------------------+-------+------+---------------+------+---------+------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+---------------+------+---------+------+-------+----------------------------------------------+
| 1 | PRIMARY | t | ALL | NULL | NULL | NULL | NULL | 44976 | Using where; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | a | ALL | NULL | NULL | NULL | NULL | 44976 | Using where |
+----+--------------------+-------+------+---------------+------+---------+------+-------+----------------------------------------------+
2 rows in set (0.00 sec)
Schema is not much more than a table with:
session_id (INT, auto incr., prim.key) | session_stop (datetime) | energy_used (INT) |
1 | 1-1-2016 10:00:00 | 123456 |
2 | 1-1-2016 10:05:00 | 123456 |
3 | 1-2-2016 10:10:00 | 123456 |
4 | 1-2-2016 12:00:00 | 123456 |
5 | 3-3-2016 14:05:00 | 123456 |
Some examples on the internets show using the ID for the WHERE-clause, but I had some poor results with this.

Your queries are not similar at all. In fact, they are poles apart.
If I remove the sum(t.energy_used / 1000) AS csum, line, the query
takes up 8 mins and 40 secs.
When you use SUM, it's an aggregation. sum(t.energy_used/ 1000) will produce an entirely different result from just selecting t.energy_used that's why there is such a huge difference in the query timings.
It is also very unclear why you are comparing dates in this manner:
WHERE date_format(a.`session_stop`, '%Y-%m-%d') <= date_format(t.`session_stop`, '%Y-%m-%d')
Why are you converting them both with date_format before comparision? Since both tables apparently contain the same data type, you should be able to do a.session_stop <= t.session_stop this will be much faster for both cases.
Since it's an inequality comparison, it's not a good candidate for indexes but you can still try creating an index on that column to see if it has any effect.
So to recap, the performance difference is because you are not merely adding/removing an extra column but adding/removing an aggregation.

Related

Why MySQL indexing is taking too much time for < operator?

This is my MYSQL table demo having more than 7 million rows;
+-------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| id | varchar(42) | YES | MUL | NULL | |
| date | datetime | YES | MUL | NULL | |
| text | varchar(100) | YES | | NULL | |
+-------+--------------+------+-----+---------+-------+
I read that indexes work sequentially.
Case 1:
select * from demo where id="43984a7e-edcf-11ea-92c7-509a4cb89342" order by date limit 30;
I created (id, date) index and it is working fine and query is executing too fast.
But Hold on to see the below cases.
Case 2:
Below is my SQL query.
select * from demo where id>"43984a7e-edcf-11ea-92c7-509a4cb89342" order by date desc limit 30;
to execute the above query faster I created an index on (id, date). But it is taking more than 10 sec.
then I made another index on (date). This took less than 1 sec. Why the composite index(id, date) is too much slower than (date) index in this case ??
Case 3:
select * from demo where id<"43984a7e-edcf-11ea-92c7-509a4cb89342" order by date desc limit 30;
for this query, even the (date) index is taking more than 1.8 sec. Why < operator is not optimized with any index either it is (date) or(id, date).
and even this query is just going through around 300 rows and still taking more than 1.8 sec why?
mysql> explain select * from demo where id<"43984a7e-edcf-11ea-92c7-509a4cb89342" order by date desc limit 30;
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+------+----------+-------------+
| 1 | SIMPLE | demo | NULL | index | demoindex1,demoindex2 | demoindex3 | 6 | NULL | 323 | 36.30 | Using where |
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+------+----------+-------------+
Any suggestions for how to create an index in Case 3 to optimize it?
In your first query, the index can be used for both the where clause and the ordering. So it will be very fast.
For the second query, the index can only be used for the where clause. Because of the inequality, information about the date is no longer in order. So the engine needs to explicitly order.
In addition, I imagine that the second query returns much more data than the first -- a fair amount of data if it take 10 seconds to sort it.

MySQL - nested select queries running many time slower than sequential queries (on a large table)

I have a MySQL query that I am having performance problems with that I do not understand. When I try to debug and run the overall query as a sequence of separate subqueries they seem to perform reasonably well, given the volume of data. When I combine them into a single nested query I get much much much longer execution times.
The main ratings table mentioned below is approx 30 million rows (4GB of disk space), with a couple of foreign keys (it's a many-to-many table linking users and items with a small amount of additional supplementary user specific item information - approx 13 fields and 30 bytes).
Query 1 - approx 23s
SELECT COUNT(1) FROM (SELECT fields FROM ratings WHERE (id >= 0 AND id < 10000)
AND item_type = 1) AS t1;
Query 1 saved to table - approx 65s if I save the results to a temporary table
CREATE TABLE temp_table SELECT fields FROM ratings WHERE (id >= 0 AND id < 10000)
AND item_type = 1;
Query 2 - approx 3s
SELECT COUNT(1) FROM temp_table WHERE id IN (SELECT id from item_stats WHERE
ratings_count > 1000);
Bases on this I would expect a combined query to be approx 30s or so, and not more than approx 70s.
Combined query (Query 1 + Query 2) - indeterminate time (10s of minutes before I give up and cancel)
SELECT COUNT(1) from (SELECT * FROM (SELECT fields FROM ratings WHERE (id >= 0
AND id < 10000) AND item_type = 1) AS t1 WHERE t1.id IN (SELECT id FROM
item_stats WHERE ratings_count > 1000)) as t2;
Can anyone help explain this difference and guide me in creating a query that works? If I need to I can rely on the sequential queries (which would take approx 70s), but that is cumbersome and does not seem the right way to go.
I have tried using INNER JOIN instead of IN but this did not seem to make much difference. The ID count from the item_stats table is about 2700 IDs.
It's using MySQL 8.0 on a laptop (16GB RAM, SSD).
Response to suggestions / questions:
Query 1
EXPLAIN select user_id, game_id, item_type_id, rating, plays, own, bgg_last_modified from collections where (user_id >= 0 and user_id < 10000) and item_type_id = 1;
+----+-------------+-------------+------------+------+---------------+------+---------+------+----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+------+---------------+------+---------+------+----------+----------+-------------+
| 1 | SIMPLE | collections | NULL | ALL | user_id | NULL | NULL | NULL | 32898400 | 1.31 | Using where |
+----+-------------+-------------+------------+------+---------------+------+---------+------+----------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
Query 2
EXPLAIN select * from temp_coll where game_id in (select game_id from games_ratings_stats where (ratings_count > 1000) or (ratings_count > 500 and ratings_avg >= 7.0));
+----+--------------+---------------------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+---------------------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
| 1 | SIMPLE | <subquery2> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | 100.00 | NULL |
| 1 | SIMPLE | temp_coll | NULL | ALL | NULL | NULL | NULL | NULL | 1674386 | 10.00 | Using where; Using join buffer (hash join) |
| 2 | MATERIALIZED | games_ratings_stats | NULL | ALL | NULL | NULL | NULL | NULL | 81585 | 40.74 | Using where |
+----+--------------+---------------------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
3 rows in set, 1 warning (0.00 sec)
Combined query
EXPLAIN select * from (select user_id, game_id, item_type_id, rating, plays, own, bgg_last_modified from collections where (user_id >= 0 and user_id < 10000) and item_type_id = 1) as t1 where t1.game_id in (select game_id from games_ratings_stats where (ratings_count > 1000) or (ratings_count > 500 and ratings_avg >= 7.0));
+----+--------------+---------------------+------------+------+-----------------+---------+---------+---------------------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+---------------------+------------+------+-----------------+---------+---------+---------------------+-------+----------+-------------+
| 1 | SIMPLE | <subquery3> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | 100.00 | Using where |
| 1 | SIMPLE | collections | NULL | ref | user_id,game_id | game_id | 5 | <subquery3>.game_id | 199 | 1.31 | Using where |
| 3 | MATERIALIZED | games_ratings_stats | NULL | ALL | NULL | NULL | NULL | NULL | 81585 | 40.74 | Using where |
+----+--------------+---------------------+------------+------+-----------------+---------+---------+---------------------+-------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)
Your query appears to be functionally identical to the following (rather implausible) query:
SELECT COUNT(*) total
FROM ratings r
JOIN item_stats s
ON s.id = r.id
WHERE r.id >= 0
AND r.id < 10000
AND r.item_type = 1
AND s.ratings_count > 1000
r.id is, presumably, the PRIMARY KEY, so it's automatically included in any INNODB index, which leaves just item_type and ratings_count requiring indexes.
You would benefit a lot from an online tutorial on learning how to read the EXPLAIN plan. The EXPLAINS you shared clearly show missing indexes.
As a general rule, queries should not take 23 seconds or 65 seconds, even with millions of rows. Proper indexes + partitioning should resolve the slowness.
Query 1: The user_id index on that table is not helping performance, as 99% of users are within the range in the where clause. You can add an index on item_type_id
ALTER TABLE collections ADD KEY (item_type_id)
Query 2: The temp_coll table is missing a game_id index. Also, I'm not sure if the underlying code for games_ratings_stats has an index on ratings_count and if that would help. I dont have experience with MySQL materialized tables.
ALTER TABLE temp_coll ADD KEY (game_id)
Query 3:
Would benefit from above indexes.
Increasing the InnoDB Buffer Pool Size (now set to 8GB) seems to have made a significant improvement. If anyone has any further setup or tuning advice on MySQL then that would be appreciated!

Speeding up large MySQL query

The following query hangs on the "sending data" phase for an incredibly long time. It is a large query but im hoping to get some assistance with my indexes and possibly learn a bit more about how MySQL actually chooses which index its going to use.
Below is the query as well as a DESCRIBE statement output.
mysql> DESCRIBE SELECT e.employee_number, s.current_status_start_date, e.company_code, e.location_code, s.last_suffix_first_mi, s.job_title, SUBSTRING(e.job_code,1,1) tt_jobCode,
-> SUM(e.current_amount) tt_grossWages,
-> IFNULL((SUM(e.current_amount) - IF(tt1.tt_reduction = '','0',tt1.tt_reduction)),SUM(e.current_amount)) tt_taxableWages,
-> t.new_code, STR_TO_DATE(s.last_hire_date, '%Y-%m-%d') tt_hireDate,
-> IF(s.current_status_code = 'T',STR_TO_DATE(s.current_status_start_date, '%Y-%m-%d'),'') tt_terminationDate,
-> IFNULL(tt_totalHours,'0') tt_totalHours
-> FROM check_earnings e
-> LEFT JOIN (
-> SELECT * FROM summary
-> GROUP BY employee_no
-> ORDER BY current_status_start_date DESC
-> ) s
-> ON e.employee_number = s.employee_no
-> LEFT JOIN (
-> SELECT employee_no, SUM(current_amount__employee) tt_reduction
-> FROM check_deductions
-> WHERE STR_TO_DATE(pay_date, '%Y-%m-%d') >= STR_TO_DATE('2012-06-01', '%Y-%m-%d')
-> AND STR_TO_DATE(pay_date, '%Y-%m-%d') <= STR_TO_DATE('2013-06-01', '%Y-%m-%d')
-> AND (
-> deduction_code IN ('DECMP','FSAM','FSAC','DCMAK','DCMAT','401KD')
-> OR deduction_code LIKE 'IM%'
-> OR deduction_code LIKE 'ID%'
-> OR deduction_code LIKE 'IV%'
-> )
-> GROUP BY employee_no
-> ORDER BY employee_no ASC
-> ) tt1
-> ON e.employee_number = tt1.employee_no
-> LEFT JOIN translation t
-> ON e.location_code = t.old_code
-> LEFT JOIN (
-> SELECT employee_number, SUM(current_hours) tt_totalHours
-> FROM check_earnings
-> WHERE STR_TO_DATE(pay_date, '%Y-%m-%d') >= STR_TO_DATE('2012-06-01', '%Y-%m-%d')
-> AND STR_TO_DATE(pay_date, '%Y-%m-%d') <= STR_TO_DATE('2013-06-01', '%Y-%m-%d')
-> AND earnings_code IN ('REG1','REG2','REG3','REG4')
-> GROUP BY employee_number
-> ) tt2
-> ON e.employee_number = tt2.employee_number
-> WHERE STR_TO_DATE(e.pay_date, '%Y-%m-%d') >= STR_TO_DATE('2012-06-01', '%Y-%m-%d')
-> AND STR_TO_DATE(e.pay_date, '%Y-%m-%d') <= STR_TO_DATE('2013-06-01', '%Y-%m-%d')
-> AND SUBSTRING(e.job_code,1,1) != 'E'
-> AND e.location_code != '639'
-> AND t.field = 'location_state'
-> GROUP BY e.employee_number
-> ORDER BY s.current_status_start_date DESC, e.location_code ASC, s.last_suffix_first_mi ASC;
+----+-------------+------------------+-------+----------------+-----------------+---------+----------------------------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+----------------+-----------------+---------+----------------------------+---------+----------------------------------------------+
| 1 | PRIMARY | e | ALL | location_code | NULL | NULL | NULL | 3498603 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | t | ref | field,old_code | old_code | 303 | historical.e.location_code | 1 | Using where |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 16741 | |
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 2530 | |
| 1 | PRIMARY | <derived4> | ALL | NULL | NULL | NULL | NULL | 2919 | |
| 4 | DERIVED | check_earnings | index | NULL | employee_number | 303 | NULL | 3498603 | Using where |
| 3 | DERIVED | check_deductions | index | deduction_code | employee_no | 303 | NULL | 6387048 | Using where |
| 2 | DERIVED | summary | index | NULL | employee_no | 303 | NULL | 17608 | Using temporary; Using filesort |
+----+-------------+------------------+-------+----------------+-----------------+---------+----------------------------+---------+----------------------------------------------+
8 rows in set, 65535 warnings (32.77 sec)
EDIT: After playing with some indexes, it now spends the most time in the "Copying to tmp table" state.
There's no way you can avoid use of a temp table in that query. One reason is that you are grouping by different columns than you are sorting by.
Another reason is the use of derived tables (subqueries in the FROM/JOIN clauses).
One way you could speed this up is to create summary tables to store the result of those subqueries so you don't have to do them during every query.
You are also forcing table-scans by searching on the result of functions like STR_TO_DATE() and SUBSTR(). These cannot be optimized with an index.
Re your comment:
I can make an SQL query against a far smaller table run for 72 hours with a poorly-optimized query.
Note for example in the output of your DESCRIBE, it shows "ALL" for several of the tables involved in the join. This means it has to do a table-scan of all the rows (shown in the 'rows' column).
A rule of thumb: how many row comparisons does it take to resolve the join? Multiple the 'rows' of all the tables joined together with the same 'id'.
+----+-------------+------------------+-------+---------+
| id | select_type | table | type | rows |
+----+-------------+------------------+-------+---------+
| 1 | PRIMARY | e | ALL | 3498603 |
| 1 | PRIMARY | t | ref | 1 |
| 1 | PRIMARY | <derived2> | ALL | 16741 |
| 1 | PRIMARY | <derived3> | ALL | 2530 |
| 1 | PRIMARY | <derived4> | ALL | 2919 |
So it may be evaluating the join conditions 432,544,383,105,752,610 times (assume those numbers are approximate, so it may not really be as bad as that). It's actually a miracle it takes only 5 hours!
What you need to do is use indexes to help the query reduce the number of rows it needs to examine.
For example, why are you using STR_TO_DATE() given that the date you are parsing is the native date format for MySQL? Why don't you store those columns as a DATE data type? Then the search could use an index.
You don't need to "play with indexes." It's not like indexing is a mystery or has random effects. See my presentation How to Design Indexes, Really for some introduction.

Query optimization, select calling function

Consider a large table, which of the below would be faster?
Both queries will select rows where time is greater than the current time.
Calling NOW() within the WHERE clause:
SELECT *
FROM myTable
WHERE time > NOW()
Wrapping the call to NOW() in a sub query:
SELECT *
FROM myTable
LEFT JOIN (
SELECT NOW() AS currentTime
) AS currentTimeTable ON TRUE
WHERE time > currentTime
"Both queries will select rows where time is no older than 10 days."
Sorry, but your query is incorrect. I tested it in MySQL and got this:
mysql> SELECT DATE(NOW() - 10);
+------------------+
| DATE(NOW() - 10) |
+------------------+
| 2013-11-21 |
+------------------+
21.11.2013 - is the current date and not now() minus 10 days.
You should use DATE_SUB function:
mysql> SELECT DATE_SUB( NOW(), INTERVAL 10 DAY );
+-------------------------------------+
| DATE_SUB( NOW() , INTERVAL 10 DAY ) |
+-------------------------------------+
| 2013-11-11 19:40:38 |
+-------------------------------------+
Something like this:
SELECT *
FROM `myTable`
WHERE `time` > DATE_SUB(NOW(), INTERVAL 10 DAY );
Here is the analysis of both of your query types:
EXPLAIN
SELECT
*
FROM
users
WHERE
created > now();
Test data
I tried users table from my Drupal installation.
mysql> EXPLAIN SELECT *
FROM users
WHERE created < NOW();
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | users | range | created | created | 4 | NULL | 1 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
EXPLAIN
SELECT
*
FROM
users
LEFT JOIN (
SELECT NOW() AS currentTime
) AS tbl ON TRUE
WHERE created < tbl.currentTime;
+----+-------------+------------+--------+---------------+---------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+---------+---------+------+------+----------------+
| 1 | PRIMARY | <derived2> | system | NULL | NULL | NULL | NULL | 1 | |
| 1 | PRIMARY | users | range | created | created | 4 | NULL | 1 | Using where |
| 2 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+-------------+------------+--------+---------------+---------+---------+------+------+----------------+
3 rows in set (0.02 sec)
Conclusion
Obviously, the 1-st query is better, as it doesn't require creating any temporary tables. Even with my little sample data, it took 0 seconds to execute the 1-st query and 0,02 secs for the 2-nd.
1) When you talk about date optimisation, using timestamps is the way to go.
2) I suggest instead of calling a function to do the math, create another column for each rows with their timestamps
3) As PeeHaa said on this thread: « When talking about which piece of code is faster (in your OP) you're talking about micro-optimization and is really something you shouldn't have to worry about. ☺
The real question is which piece of code is: better maintainable, readable, understandable. »
I understand what is very large database management but once you've optimised indexes and datatypes you most likely will be good enough.

calculation user's age on fly, optimization. mysql

I have next (strange) query
SELECT DISTINCT c.id
FROM z1 INNER JOIN c c ON (z1.id=c.id)
INNER JOIN i ON (c.member_id=i.member_id)
WHERE DATE_FORMAT(CONCAT(i.birthyear,"-",i.birthmonth,"-",i.birthday),"%Y%m%d000000") BETWEEN '19820605000000' AND '19930604235959' AND c.id NOT IN (658887)
GROUP BY c.id
user's birthday keeps in db in three different colums. but here is the task to find out user's stuff which ages are in specific range.
The worst thing, that mysql will calculate age for each selected record and compare it with condition and it's not good :( is there any way to make it faster ?
this is the plan
+----+-------------+-------+--------+-------------------+---------+---------+--------------------+--------+----------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+--------+-------------------+---------+---------+--------------------+--------+----------+-----------------------------------------------------------+
| 1 | SIMPLE | z1 | index | PRIMARY | PRIMARY | 4 | NULL | 176659 | 100.00 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | c | eq_ref | PRIMARY,member_id | PRIMARY | 4 | z1.id | 1 | 100.00 | |
| 1 | SIMPLE | i | eq_ref | PRIMARY | PRIMARY | 4 | c.member_id | 1 | 100.00 | Using where |
+----+-------------+-------+--------+-------------------+---------+---------+--------------------+--------+----------+-----------------------------------------------------------+
As usual, the right answer is to fix your schema. i.e. data should be normalized, use native keys wherever practical and use the right data types.
Looking at your post, at least you've provided a EXPLAIN plan - but the table structures would help too.
Why is the table z1 in the query? You don't explicitly filter using it, and you don't use the result anywhere.
Why do you do bot a DISTINCT and a GROUP BY - you're asking the DBMS to do the same work twice.
Why do you use 'c' as an alias for 'c'?
Why are you using NOT IN to exclude a single value?
Why do you compare your date values as strings?
It's posible that the optimizer is getting confused about the best way to resolve the query - but you've not provided any information to support this - what proportion of the data is filterd by the age rule? You may get better results using the birthday / i table to drive the query:
SELECT DISTINCT c.id
FROM c
INNER JOIN i ON (c.member_id=i.member_id)
WHERE STR_TO_DATE(
CONCAT(i.birthyear,'-', i.birthmonth,'-',i.birthday)
,"%Y-%m-%d")
BETWEEN 19820605000000 AND 19930604235959
AND c.id <> 658887
AND i.birthyear BETWEEN 1982 AND 1993
Alter i table and add a TIMESTAMP or DATETIME column named date_of_birth with a INDEX on it :
ALTER TABLE i ADD date_of_birth DATETIME NOT NULL, ADD INDEX date_of_birth;
UPDATE i SET date_of_birth = CONCAT(i.birthyear,"-",i.birthmonth,"-",i.birthday);
And use this query which should be faster:
SELECT
c.id
FROM
i
INNER JOIN c
ON c.member_id=i.member_id
WHERE
i.date_of_bith BETWEEN '1982-06-05 00:00:00' AND '1993-06-04 23:59:59'
AND c.id NOT IN (658887)
GROUP BY
c.id
ORDER BY
NULL
You've asked me to explain what I mean. Unfortunately there are two problems with that.
The first is that I don't think that this can be adequately explained in a simple comments box.
The second is that I don't really know what I'm talking about, but I'll have a go...
Consider the following example - a simple utility table containing dates up to 2038 (when the whole UNIX_TIMESTAMP thing stops working anyway)...
CREATE TABLE calendar (
dt date NOT NULL DEFAULT '0000-00-00',
PRIMARY KEY (`dt`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Now, the following queries are logically identical...
SELECT * FROM calendar WHERE UNIX_TIMESTAMP(dt) BETWEEN 1370521405 AND 1370732400;
+------------+
| dt |
+------------+
| 2013-06-07 |
| 2013-06-08 |
| 2013-06-09 |
+------------+
SELECT * FROM calendar WHERE dt BETWEEN FROM_UNIXTIME(1370521405) AND FROM_UNIXTIME(1370732400);
+------------+
| dt |
+------------+
| 2013-06-07 |
| 2013-06-08 |
| 2013-06-09 |
+------------+
...and MySQL is clever enough to utilise the (PK) index to resolve both queries (rather than reading the table itself - yuk).
But while the first requires a full scan over the entire index (good but not great), the second is able to access the table with a key over one (or more) value ranges (terrific)...
EXPLAIN EXTENDED
SELECT * FROM calendar WHERE UNIX_TIMESTAMP(dt) BETWEEN 1370521405 AND 1370732400;
+----+-------------+----------+-------+---------------+---------+---------+------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+------+-------+--------------------------+
| 1 | SIMPLE | calendar | index | NULL | PRIMARY | 3 | NULL | 10957 | Using where; Using index |
+----+-------------+----------+-------+---------------+---------+---------+------+-------+--------------------------+
EXPLAIN EXTENDED
SELECT * FROM calendar WHERE dt BETWEEN FROM_UNIXTIME(1370521405) AND FROM_UNIXTIME(1370732400);
+----+-------------+----------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | calendar | range | PRIMARY | PRIMARY | 3 | NULL | 3 | Using where; Using index |
+----+-------------+----------+-------+---------------+---------+---------+------+------+--------------------------+