MySql: Optimizing Subquery with index_merge - mysql

I want to lists all users and the status of the newsletter-subscribtion. Since someone doesn't need to be a user if subscribed to the newsletter, I'm doing something like:
SELECT user.id, user.email,
(SELECT newsletter.status FROM newsletter
WHERE newsletter.email=user.email OR newsletter.user = user.id) AS status
FROM user
WHERE...
Indexes are user.id, user.email, newsletter.email, newsletter.user.
The OR makes the query incredible slow. I found here Union as sub query MySQL that you can do an "index merge" which will speed-up the query. But I'm not sure how to force MySQL to do an index merge in my case. Any ideas?
Added: 2nd row of explain doesn't use a key.
DROP TABLE if exists user;
DROP TABLE if exists newsletter;
create table user (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
`email` varchar(255) DEFAULT NULL, INDEX email(email)
) ENGINE=InnoDB;
create table newsletter (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
`status` enum('subscribed','unsubscribed') DEFAULT NULL,
`email` varchar(255) DEFAULT NULL, INDEX email(email),
`user` int(10) unsigned DEFAULT NULL, INDEX user(user)
) ENGINE=InnoDB;
EXPLAIN SELECT user.id, user.email,
(SELECT newsletter.status FROM newsletter
WHERE newsletter.email=user.email OR newsletter.user = user.id) AS status
FROM user;
+----+--------------------+------------+------------+-------+---------------+-------+---------+------+------+----------+------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+------------+------------+-------+---------------+-------+---------+------+------+----------+------------------------------------------------+
| 1 | PRIMARY | user | NULL | index | NULL | email | 768 | NULL | 1 | 100.00 | Using index |
| 2 | DEPENDENT SUBQUERY | newsletter | NULL | ALL | email,user | NULL | NULL | NULL | 1 | 100.00 | Range checked for each record (index map: 0x6) |
+----+--------------------+------------+------------+-------+---------------+-------+---------+------+------+----------+------------------------------------------------+
EXPLAIN SELECT user.id, user.email, newsletter.status
FROM user
LEFT JOIN newsletter ON (newsletter.email=user.email OR newsletter.user = user.id);
+----+-------------+------------+------------+-------+---------------+-------+---------+------+------+----------+------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+---------------+-------+---------+------+------+----------+------------------------------------------------+
| 1 | SIMPLE | user | NULL | index | NULL | email | 768 | NULL | 1 | 100.00 | Using index |
| 1 | SIMPLE | newsletter | NULL | ALL | email,user | NULL | NULL | NULL | 1 | 100.00 | Range checked for each record (index map: 0x6) |
+----+-------------+------------+------------+-------+---------------+-------+---------+------+------+----------+------------------------------------------------+

Unless you have switched off index_merge, MySQL will use it if it thinks it will be of benefit. But I find it kicks in less often than one might expect, and even when it does, it's not very helpful — the query is still a lot slower than using an index in the conventional way.
The typical solution for these types of queries is to do a UNION of two simpler queries.
explain select u.id, u.email, n.status
from user u left join newsletter n on u.email=n.email
union
select u.id, u.email, n.status
from user u left join newsletter n on u.id=n.user;
+----+--------------+------------+-------+---------------+-------+---------+--------------+------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+-------+---------------+-------+---------+--------------+------+-----------------+
| 1 | PRIMARY | u | index | NULL | email | 403 | NULL | 1 | Using index |
| 1 | PRIMARY | n | ref | email | email | 403 | test.u.email | 1 | NULL |
| 2 | UNION | u | index | NULL | email | 403 | NULL | 1 | Using index |
| 2 | UNION | n | ref | user | user | 5 | test.u.id | 1 | NULL |
| NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | Using temporary |
+----+--------------+------------+-------+---------------+-------+---------+--------------+------+-----------------+
You'd be better off fixing your data so you only have to join on the user id, not on the email.

Related

Optimizing join on derived table - EXPLAIN different on local and server

I have the following ugly query, which runs okay but not great, on my local machine (1.4 secs, running v5.7). On the server I'm using, which is running an older version of MySQL (v5.5), the query just hangs. It seems to get caught on "Copying to tmp table":
SELECT
SQL_CALC_FOUND_ROWS
DISTINCT p.parcel_number,
p.street_number,
p.street_name,
p.site_address_city_state,
p.number_of_units,
p.number_of_stories,
p.bedrooms,
p.bathrooms,
p.lot_area_sqft,
p.cost_per_sq_ft,
p.year_built,
p.sales_date,
p.sales_price,
p.id
FROM (
SELECT APN, property_case_detail_id FROM property_inspection AS pi
GROUP BY APN, property_case_detail_id
HAVING
COUNT(IF(status='Resolved Date', 1, NULL)) = 0
) as open_cases
JOIN property AS p
ON p.parcel_number = open_cases.APN
LIMIT 0, 1000;
mysql> show processlist;
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
| 21120 | headsupcity | localhost | lead_housing | Query | 21 | Copying to tmp table | SELECT
SQL_CALC_FOUND_ROWS
DISTINCT p.parcel_number,
p.street_numbe |
| 21121 | headsupcity | localhost | lead_housing | Query | 0 | NULL | show processlist |
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
Explains are different on my local machine and on the server, and I'm assuming the only reason my query runs at all on my local machine, is because of the key that is automatically created on the derived table:
Explain (local):
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+
| 1 | PRIMARY | p | NULL | ALL | NULL | NULL | NULL | NULL | 40319 | 100.00 | Using temporary |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 8 | lead_housing.p.parcel_number | 40 | 100.00 | NULL |
| 2 | DERIVED | pi | NULL | ALL | NULL | NULL | NULL | NULL | 1623978 | 100.00 | Using temporary; Using filesort |
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+
Explain (server):
+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+
| 1 | PRIMARY | p | ALL | NULL | NULL | NULL | NULL | 41369 | Using temporary |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 122948 | Using where; Distinct; Using join buffer |
| 2 | DERIVED | pi | ALL | NULL | NULL | NULL | NULL | 1718586 | Using temporary; Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+
Schemas:
mysql> explain property_inspection;
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| lblCaseNo | int(11) | NO | MUL | NULL | |
| APN | bigint(10) | NO | MUL | NULL | |
| date | varchar(50) | NO | | NULL | |
| status | varchar(500) | NO | | NULL | |
| property_case_detail_id | int(11) | YES | MUL | NULL | |
| case_type_id | int(11) | YES | MUL | NULL | |
| date_modified | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| update_status | tinyint(1) | YES | | 1 | |
| created_date | datetime | NO | | NULL | |
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
10 rows in set (0.02 sec)
mysql> explain property; (not all columns, but you get the gist)
+----------------------------+--------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------------+--------------+------+-----+-------------------+-----------------------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| parcel_number | bigint(10) | NO | | 0 | |
| date_modified | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| created_date | datetime | NO | | NULL | |
+----------------------------+--------------+------+-----+-------------------+-----------------------------+
Variables that might be relevant:
tmp_table_size: 16777216
innodb_buffer_pool_size: 8589934592
Any ideas on how to optimize this, and any idea why the explains are so different?
Since this is where the Optimizers are quite different, let's try to optimize
SELECT APN, property_case_detail_id FROM property_inspection AS pi
GROUP BY APN, property_case_detail_id
HAVING
COUNT(IF(status='Resolved Date', 1, NULL)) = 0
) as open_cases
Give this a try:
SELECT ...
FROM property AS p
WHERE NOT EXISTS ( SELECT 1 FROM property_inspection
WHERE status = 'Resolved Date'
AND p.parcel_number = APN )
ORDER BY ??? -- without this, the `LIMIT` is unpredictable
LIMIT 0, 1000;
or...
SELECT ...
FROM property AS p
LEFT JOIN property_inspection AS pi ON p.parcel_number = pi.APN
WHERE pi.status = 'Resolved Date'
AND pi.APN IS NULL
ORDER BY ??? -- without this, the `LIMIT` is unpredictable
LIMIT 0, 1000;
Index:
property_inspection: INDEX(status, parcel_number) -- in either order
MySQL 5.5 and 5.7 are quite different and the later has better optimizer so there is no surprise that explain plans are different.
You'd better provide SHOW CREATE TABLE property; and SHOW CREATE TABLE property_inspection; outputs as it will show indexes that are on your tables.
Your sub-query is the issue.
- Server tries to process 1.6M rows with no index and grouping everything.
- Having is quite expensive operation so you'd better avoid it, expecially in sub-queries.
- Grouping in this case is bad idea. You do not need the aggregation/counting. You need to check if the 'Resolved Date' status is just exists
Based on the information provided I'd recommend:
- Alter table property_inspection to reduce length of status column.
- Add index on the column. Use covering index (APN, property_case_detail_id, status) if possible (in this columns order).
- Change query to something like this:
SELECT
SQL_CALC_FOUND_ROWS
DISTINCT p.parcel_number,
...
p.id
FROM
property_inspection AS `pi1`
INNER JOIN property AS p ON (
p.parcel_number = `pi1`.APN
)
LEFT JOIN (
SELECT
`pi2`.property_case_detail_id
, `pi2`. APN
FROM
property_inspection AS `pi2`
WHERE
`status` = 'Resolved Date'
) AS exclude ON (
exclude.APN = `pi1`.APN
AND exclude.property_case_detail_id = `pi1`.property_case_detail_id
)
WHERE
exclude.APN IS NULL
LIMIT
0, 1000;

Debugging Slow mySQL query with Explain

Have found an inefficient query in our system. content holds versions of slides, and this is supposed to select the highest version of a slide by id.
SELECT `content`.*
FROM (`content`)
JOIN (
SELECT max(version) as `version` from `content`
WHERE `slide_id` = '16901'
group by `slide_id`
) c ON `c`.`version` = `content`.`version`;
EXPLAIN
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------+------+----------+--------------------------+
| 1 | PRIMARY | <derived2> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | PRIMARY | content | NULL | ref | PRIMARY,version | PRIMARY | 8 | const | 9703 | 100.00 | NULL |
| 2 | DERIVED | content | NULL | ref | PRIMARY,fk_content_slides_idx,thumbnail_asset_id,version,slide_id | fk_content_slides_idx | 8 | const | 1 | 100.00 | Using where; Using index |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------+------+----------+--------------------------+
One big issue is that it returns almost all the slides in the system as the outer query does not filter by slide id. After adding that I get...
SELECT `content`.*
FROM (`content`)
JOIN (
SELECT max(version) as `version` from `content`
WHERE `slide_id` = '16901' group by `slide_id`
) c ON `c`.`version` = `content`.`version`
WHERE `slide_id` = '16901';
EXPLAIN
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------------+------+----------+--------------------------+
| 1 | PRIMARY | <derived2> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | PRIMARY | content | NULL | const | PRIMARY,fk_content_slides_idx,version,slide_id | PRIMARY | 16 | const,const | 1 | 100.00 | NULL |
| 2 | DERIVED | content | NULL | ref | PRIMARY,fk_content_slides_idx,thumbnail_asset_id,version,slide_id | fk_content_slides_idx | 8 | const | 1 | 100.00 | Using where; Using index |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------------+------+----------+--------------------------+
That reduces the amount of rows down to one correctly, but doesnt really speed things up.
There are indexes on version, slide_id and a unique key on version AND slide_id.
Is there anything else I can do to speed this up?
Use a TOP LIMIT 1 insetead of Max ?
m
MySQL seems to take an index (version, slide_id) to join the tables. You should get a better result with
SELECT `content`.*
FROM `content`
FORCE INDEX FOR JOIN (fk_content_slides_idx)
join (
SELECT `slide_id`, max(version) as `version` from `content`
WHERE `slide_id` = '16901' group by `slide_id`
) c ON `c`.`slide_id` = `content`.`slide_id` and `c`.`version` = `content`.`version`
You need an index that has slide_id as first column, I just guessed that's fk_content_slides_idx, if not, take another one.
The part FORCE INDEX FOR JOIN (fk_content_slides_idx) is just to enforce it, you should try if mysql takes it by itself without forcing (it should).
You might get even a slightly better result with an index (slide_id, version), it depends on the amount of data (e.g. the number of versions per id) if you see a difference (but you should not spam indexes, and you already have a lot on this table, but you can try it for fun.)
Just a suggestion i think you should avoid the group by slide_id because you are filter by one slide_id only (16901)
SELECT `content`.*
FROM (`content`)
JOIN (
SELECT max(version) as `version` from `content`
WHERE `slide_id` = '16901'
) c ON `c`.`version` = `content`.`version`
WHERE `slide_id` = '16901';

More efficient sql to find records not in join table?

I have:
_sms_users_
id
_join_smsuser_campaigns_
sms_user_id
I'd like to pull out the sms_users that don't have a record in the join_smsuser_campaigns table (via the join_smsuser_campaigns.sms_user_id = sms_users.id relationships)
I have the sql:
select * from sms_users where id not in (select sms_user_id from join_smsuser_campaigns);
edit:
here's the explain select results:
mysql> explain select u.* from sms_users u left join join_smsuser_campaigns c on u.id = c.sms_user_id where c.sms_user_id is null;
+----+-------------+-------+-------+---------------+-------------------------------------------------------------+---------+------+-------+--------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------------------------------------------------------------+---------+------+-------+--------------------------------------+
| 1 | SIMPLE | u | ALL | NULL | NULL | NULL | NULL | 42303 | |
| 1 | SIMPLE | c | index | NULL | index_join_smsuser_campaigns_on_campaign_id_and_sms_user_id | 8 | NULL | 30722 | Using where; Using index; Not exists |
+----+-------------+-------+-------+---------------+-------------------------------------------------------------+---------+------+-------+--------------------------------------+
mysql> describe sms_users;
+------------------------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
mysql> describe join_smsuser_campaigns;
+---------------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+------------------+------+-----+---------+-------+
| sms_user_id | int(11) | NO | | NULL | |
Looks like the problm is that I don't have an index on sms_user_id on the join?
This takes about 5 minutes to run and I see that there is one record. Is there a more efficient way to do this via a join? My sql skills are pretty basic.
Queries can get slow when having many items in the IN clause. Use a left join instead
select u.*
from sms_users u
left join join_smsuser_campaigns c on u.id = c.sms_user_id
where c.sms_user_id is null
See this great join explanation

why inner join on condition does not use key

I am creating a table as
create table temp_test2 (
date_id int(11) NOT NULL DEFAULT '0',
`date` date NOT NULL,
`day` int(11) NOT NULL,
PRIMARY KEY (date_id)
);
create table temp_test1 (
date_id int(11) NOT NULL DEFAULT '0',
`date` date NOT NULL,
`day` int(11) NOT NULL,
PRIMARY KEY (date_id)
);
explain select * from temp_test as t inner join temp_test2 as t2 on (t2.date_id =t.date_id) limit 3;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| 1 | SIMPLE | t | ALL | date_id | NULL | NULL | NULL | 4 | NULL |
| 1 | SIMPLE | t2 | ALL | date_id | NULL | NULL | NULL | 4 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
why the code_id key is not used in both the table, but when I use code_id=something in on condition it's using the key,
explain select * from temp_test as t inner join temp_test2 as t2 on (t2.date_id =t.date_id and t.date_id =1) limit 3;
+----+-------------+-------+-------+-------------------------------------+---------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-------------------------------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | t | const | PRIMARY,date_id,date_id_2,date_id_3 | PRIMARY | 4 | const | 1 | NULL |
| 1 | SIMPLE | t2 | ref | date_id,date_id_2,date_id_3 | date_id | 4 | const | 1 | NULL |
+----+-------------+-------+-------+-------------------------------------+---------+---------+-------+------+-------+
I tried (unique,composite primary,composite) key also but it is not working.
Can anyone explain why this so?
Because your tables contain a very small number of records, the optimiser decides that it is not worth using the index. A table scan will do just as good.
Also, you selected all fields (SELECT *), if it used the index for executing the JOIN a row scan would still be required to get the full contents.
The query would be more likely to use the index if:
you selected only the date_id field
there were more than 4 rows in temp_test

Optimizing query in the MySQL slow-query log

Our database is set up so that we have a credentials table that hold multiple different types of credentials (logins and the like). There's also a credential_pairs table that associates some of these types together (for instance, a user may have a password and security token).
In an attempt to see if a pair match, there is the following query:
SELECT DISTINCT cp.credential_id FROM credential_pairs AS cp
INNER JOIN credentials AS c1 ON (cp.primary_credential_id = c1.credential_id)
INNER JOIN credentials AS c2 ON (cp.secondary_credential_id = c2.credential_id)
WHERE c1.data = AES_ENCRYPT('Some Value 1', 'encryption key')
AND c2.data = AES_ENCRYPT('Some Value 2', 'encryption key');
This query works fine and gives us exactly what we need. HOWEVER, it is constantly showing in the slow query log (possibly due to lack of indexes?). When I ask MySQL to "explain" the query it gives me:
+----+-------------+-------+------+--------------------------------------------------------+---------------------+---------+-------+-------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+--------------------------------------------------------+---------------------+---------+-------+-------+--------------------------------+
| 1 | SIMPLE | c1 | ref | credential_id_UNIQUE,credential_id,ix_credentials_data | ix_credentials_data | 22 | const | 1 | Using where; Using temporary |
| 1 | SIMPLE | c2 | ref | credential_id_UNIQUE,credential_id,ix_credentials_data | ix_credentials_data | 22 | const | 1 | Using where |
| 1 | SIMPLE | cp | ALL | NULL | NULL | NULL | NULL | 69197 | Using where; Using join buffer |
+----+-------------+-------+------+--------------------------------------------------------+---------------------+---------+-------+-------+--------------------------------+
I have a feeling that last entry (where it shows 69197 rows) is probably the problem, but I am FAR from a DBA... help?
credentials table:
CREATE TABLE `credentials` (
`hidden_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`credential_id` varchar(255) NOT NULL,
`data` blob NOT NULL,
`credential_status` varchar(100) NOT NULL,
`insert_date` datetime NOT NULL,
`insert_user` int(10) unsigned NOT NULL,
`update_date` datetime DEFAULT NULL,
`update_user` int(10) unsigned DEFAULT NULL,
`delete_date` datetime DEFAULT NULL,
`delete_user` int(10) unsigned DEFAULT NULL,
`is_deleted` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`hidden_id`,`credential_id`),
UNIQUE KEY `credential_id_UNIQUE` (`credential_id`),
KEY `credential_id` (`credential_id`),
KEY `data` (`data`(10)),
KEY `credential_status` (`credential_status`(10))
) ENGINE=InnoDB AUTO_INCREMENT=1572 DEFAULT CHARSET=utf8;
credential_pairs Table:
CREATE TABLE `credential_pairs` (
`hidden_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`credential_id` varchar(255) NOT NULL,
`primary_credential_id` varchar(255) NOT NULL,
`secondary_credential_id` varchar(255) NOT NULL,
`is_deleted` tinyint(1) DEFAULT NULL,
PRIMARY KEY (`hidden_id`,`credential_id`),
KEY `primary_credential_id` (`primary_credential_id`(10)),
KEY `secondary_credential_id` (`secondary_credential_id`(10))
) ENGINE=InnoDB AUTO_INCREMENT=500 DEFAULT CHARSET=latin1;
credentials Indexes:
+-------------+------------+----------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------------+------------+----------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
| credentials | 0 | PRIMARY | 1 | hidden_id | A | 186235 | NULL | NULL | | BTREE | |
| credentials | 0 | PRIMARY | 2 | credential_id | A | 186235 | NULL | NULL | | BTREE | |
| credentials | 0 | credential_id_UNIQUE | 1 | credential_id | A | 186235 | NULL | NULL | | BTREE | |
| credentials | 1 | credential_id | 1 | credential_id | A | 186235 | NULL | NULL | | BTREE | |
| credentials | 1 | ix_credentials_data | 1 | data | A | 186235 | 20 | NULL | | BTREE | |
+-------------+------------+----------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
credential_pair Indexes:
+------------------+------------+---------------------------------------------+--------------+-------------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+------------------+------------+---------------------------------------------+--------------+-------------------------+-----------+-------------+----------+--------+------+------------+---------+
| credential_pairs | 0 | PRIMARY | 1 | hidden_id | A | 69224 | NULL | NULL | | BTREE | |
| credential_pairs | 0 | PRIMARY | 2 | credential_id | A | 69224 | NULL | NULL | | BTREE | |
| credential_pairs | 1 | ix_credential_pairs_credential_id | 1 | credential_id | A | 69224 | 36 | NULL | | BTREE | |
| credential_pairs | 1 | ix_credential_pairs_primary_credential_id | 1 | primary_credential_id | A | 69224 | 36 | NULL | | BTREE | |
| credential_pairs | 1 | ix_credential_pairs_secondary_credential_id | 1 | secondary_credential_id | A | 69224 | 36 | NULL | | BTREE | |
+------------------+------------+---------------------------------------------+--------------+-------------------------+-----------+-------------+----------+--------+------+------------+---------+
UPDATE NOTES:
AFAICT: The DISTINCT was superfluous... nothing really needed it, so I dropped it. In an attempt to follow Fabrizio's advice to get a where on the credential_pairs lookup I then altered the statement to read as:
SELECT credential_id
FROM credential_pairs cp
WHERE cp.primary_credential_id = (SELECT credential_id FROM credentials WHERE data = AES_ENCRYPT('value 1','enc_key')) AND
cp.secondary_credential_id = (SELECT credential_id FROM credentials WHERE data = AES_ENCRYPT('value 2','enc_key'))
And.... nothing. The statement takes just as long and the explain looks pretty much the same. So, I added an index to the primary and secondary columns with:
ALTER TABLE credential_pairs ADD INDEX `idx_credential_pairs__primary_and_secondary`(`primary_credential_id`, `secondary_credential_id`);
And... nothing.
+----+-------------+-------------+-------+---------------------+---------------------------------------------+---------+------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+---------------------+---------------------------------------------+---------+------+-------+--------------------------+
| 1 | PRIMARY | cp | index | NULL | idx_credential_pairs__primary_and_secondary | 514 | NULL | 69217 | Using where; Using index |
| 3 | SUBQUERY | credentials | ref | ix_credentials_data | ix_credentials_data | 22 | | 1 | Using where |
| 2 | SUBQUERY | credentials | ref | ix_credentials_data | ix_credentials_data | 22 | | 1 | Using where |
+----+-------------+-------------+-------+---------------------+---------------------------------------------+---------+------+-------+--------------------------+
It says it's using the index, but it still looks like it's table scanning. So, I added a joint key (as per a'r's comment below) with:
ALTER TABLE credential_pairs ADD KEY (primary_credential_id, secondary_credential_id);
And... same result as with the index (are these functionally the same?).
The DISTINCT is what is generating the "Use temporary", you usually want to avoid those when possible
Plus you are scanning the whole credential_pair table as you do not have any conditions against it so no indexes are used and the whole table is returned before applying the WHERE
hope it makes sense
EDIT/ADD
Try by starting from a different table, if I understand correctly, you have Table A, a Table B and a Table AB and you are starting the select from AB, try to start it from A
I haven't tested this, but you could try:
SELECT cp.credential_id
FROM credentials AS c1
LEFT JOIN credential_pairs AS cp ON (c1.credential_id = cp.primary_credential_id)
LEFT JOIN credentials AS c2 ON (cp.secondary_credential_id = c2.credential_id)
WHERE
c1.data = AES_ENCRYPT('Some Value 1', 'encryption key')
AND c2.data = AES_ENCRYPT('Some Value 2', 'encryption key');
I have had luck in the past by moving select tables around