Why does MySQL not use indexes for this select query? - mysql

Why does this query not use indexes here? The table uses the InnoDB engine.
explain SELECT null as id,
-> up_time,
-> reg_date,
-> refer,
-> MAX(IFNULL(visits_count,0)) as visits_count,
-> MAX(IFNULL(register_count,0)) as register_count,
-> MAX(IFNULL(players_count,0)) as players_count,
-> MAX(IFNULL(activity_count,0)) as activity_count,
-> MAX(IFNULL(payment_users_count,0)) as payment_users_count,
-> MAX(IFNULL(payment_count,0)) as payment_count,
-> MAX(IFNULL(payment_sum,0)) as payment_sum FROM stats_refers
->
-> WHERE
->
-> stats_refers.reg_date < 1435006800
-> AND stats_refers.up_time < 1435006800
->
-> GROUP BY stats_refers.refer, stats_refers.reg_date;
And the explain:
+----+-------------+--------------+------+----------------------------+------+---------+------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+----------------------------+------+---------+------+---------+----------------------------------------------+
| 1 | SIMPLE | stats_refers | ALL | reg_date,stat,up_reg_index | NULL | NULL | NULL | 2983126 | Using where; Using temporary; Using filesort |
+----+-------------+--------------+------+----------------------------+------+---------+------+---------+----------------------------------------------+
And the keys that can be used:
+--------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| stats_refers | 0 | PRIMARY | 1 | id | A | 2983126 | NULL | NULL | | BTREE | | |
| stats_refers | 0 | reg_date | 1 | reg_date | A | 13317 | NULL | NULL | | BTREE | | |
| stats_refers | 0 | reg_date | 2 | up_time | A | 1491563 | NULL | NULL | | BTREE | | |
| stats_refers | 0 | reg_date | 3 | refer | A | 2983126 | NULL | NULL | | BTREE | | |
| stats_refers | 1 | stat | 1 | reg_date | A | 15142 | NULL | NULL | | BTREE | | |
| stats_refers | 1 | stat | 2 | refer | A | 28683 | NULL | NULL | | BTREE | | |
| stats_refers | 1 | refer_uptime | 1 | refer | A | 2307 | NULL | NULL | | BTREE | | |
| stats_refers | 1 | refer_uptime | 2 | up_time | A | 1491563 | NULL | NULL | | BTREE | | |
| stats_refers | 1 | up_reg_index | 1 | reg_date | A | 2314 | NULL | NULL | | BTREE | | |
| stats_refers | 1 | up_reg_index | 2 | up_time | A | 1491563 | NULL | NULL | | BTREE | | |
+--------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
And here is the table description:
CREATE TABLE `stats_refers` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`reg_date` int(10) unsigned NOT NULL,
`up_time` int(10) unsigned NOT NULL,
`refer` varchar(16) NOT NULL DEFAULT '',
`visits_count` int(10) unsigned NOT NULL DEFAULT '0',
`register_count` int(10) unsigned NOT NULL DEFAULT '0',
`players_count` int(10) unsigned NOT NULL DEFAULT '0',
`activity_count` int(10) unsigned NOT NULL DEFAULT '0',
`payment_users_count` int(10) unsigned NOT NULL DEFAULT '0',
`payment_count` int(10) unsigned NOT NULL DEFAULT '0',
`payment_sum` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `reg_date` (`reg_date`,`up_time`,`refer`),
KEY `stat` (`reg_date`,`refer`),
KEY `refer_uptime` (`refer`,`up_time`),
KEY `up_reg_index` (`reg_date`,`up_time`)
) ENGINE=InnoDB AUTO_INCREMENT=4136504 DEFAULT CHARSET=utf8 |
+----+-------------+--------------+-------+-----------------------------------+-------------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+-----------------------------------+-------------+---------+------+---------+-------------+
| 1 | SIMPLE | stats_refers | index | reg_date,up_time,reg,search_index | group_index | 54 | NULL | 3011896 | Using where |
+----+-------------+--------------+-------+-----------------------------------+-------------+---------+------+---------+-------------+

The compound index stat, which I assume was created for use for this grouping operation, is in the reverse order for doing the grouping operation that you are trying to perform. To have an index available for this grouping operation, you would need to do one of the following:
reverse the column order on the stat index
add an individual index on refer
add another compound index on refer, reg_date
What decision you make would ultimately need to consider other query operations on the table.
You may want to think more broadly about your index usage here. While using compound indexes can improve performance vs. using individual indexes for each field that might need to be used across the range of queries on the table, in your case, you are replicating indexes on the same fields in different combinations within your table description. Without understanding all query use cases you might have it would be hard to look at overall indexing recommendations, but I just wanted to point out that you may want to think about this.
For example, there would be no apparent need at all for your up_reg_index as that indexing is already covered by the reg_date unique index. You might be best served with a set of indexes like this:
PRIMARY KEY (`id`),
UNIQUE KEY `regdate_uptime_refer` (`reg_date`,`up_time`,`refer`),
KEY `reg_date` (`reg_date`),
KEY `refer` (`refer`),
KEY `up_time` (`up_time`),
This would certainly require less space for your indexes than what you currently have and would allow for more flexibility in how to filter/join/group on these columns, but don't take this as a firm recommendation. Test the performance of different indexing scenarios against your different query scenarios (particularly insert performance if that is a use case).

INDEX(refer, reg_date)
in that order might get the optimizer to use that for the GROUP BY.
INDEX(up_time)
might be useful for the part of the WHERE that references up_time.
reg_date < 1435006800
AND up_time < 1435006800
The optimizer will think about using INDEX(reg_date, ...) or INDEX(up_time, ...), but only if the 'range' is less than, say, 20% of the table. It is very unlikely to try to use two separate indexes and do a "index merge".
Because of the "ranges" in the WHERE, it is not possible to have an index that handles both the WHERE and the ORDER BY.
No other indexes are useful for that query.
This 20% is imprecise. It exists because if too much of the table needs to be looked at, it is actually faster to do a table scan than to use the index.
You are selecting up_time, but not grouping by it. That's naughty. The query can deliver whatever up_time it feels like.
Why have an id, since you have a UNIQUE key that could be the PRIMARY KEY? I ask because the "clustering" of the PK could lead to this being the best option:
PRIMARY KEY(refer, reg_date, uptime)
(Of course, changing just that might mess up other queries.)

Related

Does the foreign key slow down the join query?

I have two databases test & test2. Both have the same tables(employees & salaries) and both have the same records. test2 database uses a foreign key and test database doesn't.
test structure
test.employees
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| emp_id | int(11) | NO | PRI | NULL | |
| name | varchar(30) | YES | | NULL | |
+--------+-------------+------+-----+---------+-------+
test.salaries
+--------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| salary | int(11) | YES | | NULL | |
| emp_id | int(11) | NO | | NULL | |
+--------+---------+------+-----+---------+----------------+
test2 structure
test2.employees
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| emp_id | int(11) | NO | PRI | NULL | |
| name | varchar(30) | YES | | NULL | |
+--------+-------------+------+-----+---------+-------+
test2.salaries
+--------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| salary | int(11) | YES | | NULL | |
| emp_id | int(11) | NO | MUL | NULL | |
+--------+---------+------+-----+---------+----------------+
I run the same join query on both databases
select * from employees inner join salaries on employees.emp_id=salaries.emp_id;
This is the output i get from test database which doesn't contain a foreign key
2844047 rows in set (3.25 sec)
This is the output i get from test2 database which contains a foreign key
2844047 rows in set (17.21 sec)
So does the foreign key slow down the join query?
Your empirical evidence suggests that in at least one case it does. So, if we believe your numbers, the answer is clearly "yes" -- and I assume you have ruled out other potential causes such as locks on the table or resource competition (actually the difference is pretty big). I presume that you want to know why.
In most databases, declaring a foreign key is about relational integrity. It would have no effect on the optimization of queries. The join conditions in the query would redundantly cover the same information.
However, MySQL does a bit more when a foreign key is declared. A foreign key declaration automatically creates an index on the columns being used. This is not standard behavior -- I'm not even sure if any other database does this.
Normally, an index would benefit performance. In this case, the optimizer has more choices on how to approach the query. For whatever reason, it is using a substandard execution plan.
You should be able to look at the explain plans and see a difference. The issue is that the optimizer has chosen the wrong plan. I would say that this is uncommon and should not dissuade you from using proper foreign key declarations in your databases.

MySQL index usage query optimization

I have the following MySQL (MyISAM) table with about 3 Million rows.
CREATE TABLE `tasks` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`node` smallint(6) NOT NULL,
`pid` int(11) NOT NULL,
`job` int(11) NOT NULL,
`a_id` int(11) DEFAULT NULL,
`user_id` int(11) NOT NULL,
`state` int(11) NOT NULL,
`start_time` int(11) NOT NULL,
`end_time` int(11) NOT NULL,
`stop_time` int(11) NOT NULL,
`end_stream` int(11) NOT NULL,
`message` varchar(255) DEFAULT NULL,
`rate` float NOT NULL,
`exiting` int(11) NOT NULL DEFAULT '0',
`bytes` int(11) NOT NULL,
`motion` tinyint(4) NOT NULL,
PRIMARY KEY (`id`),
KEY `a_id` (`a_id`),
KEY `job` (`job`),
KEY `state` (`state`),
KEY `end_time` (`end_time`),
KEY `start_time` (`start_time`),
) ENGINE=MyISAM AUTO_INCREMENT=100 DEFAULT CHARSET=utf8;
Now when I run the following query, MySQL is only using the a_id index and needs to scan a few thousand rows.
SELECT count(id) AS tries FROM `tasks` WHERE ( job='1' OR job='3' )
AND a_id='614' AND state >'80' AND state < '100' AND start_time >='1386538013';
When I add an additional index KEY newkey (a_id,state,start_time), MySQL is still trying to use a_id only and not newkey. Only when using the hint / force index in the query, it's been used. Changing the fields in the query around does not help.
Any ideas? I don't necessarily want hints in my statements. The fact that MySQL is not doing this automatically indicates to me that there is an issue with my table, keys or query somewhere. Any help is highly appreciated.
Additional info:
mysql> show index in tasks;
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| tasks | 0 | PRIMARY | 1 | id | A | 3130554 | NULL | NULL | | BTREE | | |
| tasks | 1 | a_id | 1 | a_id | A | 2992 | NULL | NULL | YES | BTREE | | |
| tasks | 1 | job | 1 | job | A | 5 | NULL | NULL | | BTREE | | |
| tasks | 1 | state | 1 | state | A | 9 | NULL | NULL | | BTREE | | |
| tasks | 1 | end_time | 1 | end_time | A | 1565277 | NULL | NULL | | BTREE | | |
| tasks | 1 | newkey | 1 | a_id | A | 2992 | NULL | NULL | YES | BTREE | | |
| tasks | 1 | newkey | 2 | state | A | 8506 | NULL | NULL | | BTREE | | |
| tasks | 1 | newkey | 3 | start_time | A | 3130554 | NULL | NULL | | BTREE | | |
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
EXPLAIN with and without quotes:
mysql> DESCRIBE SELECT count(id) AS tries FROM `tasks` WHERE ( job='1' OR job='3' ) AND a_id='614' AND state >'80' AND state < '100' AND start_time >='1386538013';
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
| 1 | SIMPLE | tasks | ref | a_id,job,state,newkey | a_id | 5 | const | 740 | Using where |
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
1 row in set (0.10 sec)
mysql> DESCRIBE SELECT count(id) AS tries FROM `tasks` WHERE ( job=1 OR job=3 ) AND a_id = 614 AND state > 80 AND state < 100 AND start_time >= 1386538013;
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
| 1 | SIMPLE | tasks | ref | a_id,job,state,newkey | a_id | 5 | const | 740 | Using where |
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
1 row in set (0.01 sec)
A few things... I would have a SINGLE compound index on
( a_id, job, state, start_time )
This to help optimize the query on all the criteria, in what I believe is the best tuned sequence. A single "A_ID", then two jobs, a small state range, then time based. Next, notice no quotes... It appears you were converting numeric to string comparisons, leave them as numeric for compare -- faster than strings.
Also, by having them all as part of the index, it is a COVERING index meaning it does NOT have to go to the raw page data to get the other values to test the qualifying records to include or not.
SELECT
count(*) AS tries
FROM
tasks
WHERE
a_id = 614
AND job IN ( 1, 3 )
AND state > 80 AND state < 100
AND start_time >= 1386538013;
Now, the why the index... consider the following scenario. You have two rooms that have boxes... In the first room, each box is an "a_id", within that are the jobs in order, within each job are the state ranges, and finally by start time.
In another room, your boxes are sorted by start time, within that a_id are sorted, and finally state.
Which would be easier to find what you need. That is how you should think on the indexes. I would rather go to one box for "A_ID = 614", then jump to Job 1 and another for Job 3. Within each Job 1, Job 3, grab 80-100, then time. You however know better your data and volume in each criteria consideration and may adjust.
Finally, the count(ID) vs count(*). All I care about is a record qualified. I don't need to know the actual ID as the filtering criteria already qualified as include or not, why look (in this case) for the actual "ID".
Probably mysql thinks that using the a_id key will using less IO.
Probably the cardinality of the key a_id is good enough.
What explains of the hinted/hintless queries say?
Most of a_id=614's state has > 80 and < 100, then it could be happened. Have you tried one of below indexes?
INDEX(a_id, start_time, state)
INDEX(start_time, a_id, state)

how to optimize query to big table

I have a table with 18,310,298 records right now.
And next query
SELECT COUNT(obj_id) AS cnt
FROM
`common`.`logs`
WHERE
`event` = '11' AND
`obj_type` = '2' AND
`region` = 'us' AND
DATE(`date`) = DATE('20120213010502');
With next structure
CREATE TABLE `logs` (
`log_id` int(11) NOT NULL AUTO_INCREMENT,
`event` tinyint(4) NOT NULL,
`obj_type` tinyint(1) NOT NULL DEFAULT '0',
`obj_id` int(11) unsigned NOT NULL DEFAULT '0',
`region` varchar(3) NOT NULL DEFAULT '',
`date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`log_id`),
KEY `event` (`event`),
KEY `obj_type` (`obj_type`),
KEY `region` (`region`),
KEY `for_stat` (`event`,`obj_type`,`obj_id`,`region`,`date`)
) ENGINE=InnoDB AUTO_INCREMENT=83126347 DEFAULT CHARSET=utf8 COMMENT='Logs table' |
and MySQL explain show the next
+----+-------------+-------+------+--------------------------------+----------+---------+-------------+--------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------+--------------------------------+----------+---------+-------------+--------+----------+--------------------------+
| 1 | SIMPLE | logs | ref | event,obj_type,region,for_stat | for_stat | 2 | const,const | 837216 | 100.00 | Using where; Using index |
+----+-------------+-------+------+--------------------------------+----------+---------+-------------+--------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
Running such query in daily peak usage time take about 5 seconds.
What can I do to make it faster ?
UPDATED: Regarding all comments I modified INDEX and take off DATE function in WHERE clause
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| logs | 0 | PRIMARY | 1 | log_id | A | 15379109 | NULL | NULL | | BTREE | |
| logs | 1 | event | 1 | event | A | 14 | NULL | NULL | | BTREE | |
| logs | 1 | obj_type | 1 | obj_type | A | 14 | NULL | NULL | | BTREE | |
| logs | 1 | region | 1 | region | A | 14 | NULL | NULL | | BTREE | |
| logs | 1 | for_stat | 1 | event | A | 157 | NULL | NULL | | BTREE | |
| logs | 1 | for_stat | 2 | obj_type | A | 157 | NULL | NULL | | BTREE | |
| logs | 1 | for_stat | 3 | region | A | 157 | NULL | NULL | | BTREE | |
| logs | 1 | for_stat | 4 | date | A | 157 | NULL | NULL | | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
mysql> explain extended SELECT COUNT(obj_id) as cnt
-> FROM `common`.`logs`
-> WHERE `event`= '11' AND
-> `obj_type` = '2' AND
-> `region`= 'est' AND
-> date between '2012-11-25 00:00:00' and '2012-11-25 23:59:59';
+----+-------------+-------+-------+--------------------------------+----------+---------+------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+-------+--------------------------------+----------+---------+------+------+----------+-------------+
| 1 | SIMPLE | logs | range | event,obj_type,region,for_stat | for_stat | 21 | NULL | 9674 | 75.01 | Using where |
+----+-------------+-------+-------+--------------------------------+----------+---------+------+------+----------+-------------+
It seems it's running faster. Thanks everyone.
The EXPLAIN output shows that the query is using only the first two columns of the for_stat index.
This is because the query doesn't use obj_id in the WHERE clause. If you create a new key without obj_id (or modify the existing key to reorder the columns), more of the key can be used and you may see better performance:
KEY `for_stat2` (`event`,`obj_type`,`region`,`date`)
If it's still too slow, changing the last condition, where you use DATE(), as said by Salman and Sashi, might improve things.
#Joni already explained what is wrong with your index. For query, I assume that your example query selects all records for 2012-02-13 regardless of time. You can change the where clause to use >= and < instead of DATE cast:
SELECT COUNT(obj_id) AS cnt
FROM
`common`.`logs`
WHERE
`event` = 11 AND
`obj_type` = 2 AND
`region` = 'us' AND
`date` >= DATE('20120213010502') AND
`date` < DATE('20120213010502') + INTERVAL 1 DAY
The date function on the date column is making the full table scan.
Try this ::
SELECT COUNT(obj_id) as cnt
FROM
`common`.`logs`
WHERE
`event` = 11
AND
`obj_type` = 2
AND
`region` = 'us'
AND
`date` = DATE('20120213010502')
As logging (inserts) needs to be fast too, use as less indices as possible.
Evaluation may take long as that is admin, not necessarily needing indices.
CREATE TABLE `logs` (
`log_id` int(11) NOT NULL AUTO_INCREMENT,
`event` tinyint(4) NOT NULL,
`obj_type` tinyint(1) NOT NULL DEFAULT '0',
`obj_id` int(11) unsigned NOT NULL DEFAULT '0',
`region` varchar(3) NOT NULL DEFAULT '',
`date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`log_id`),
KEY `for_stat` (`event`,`obj_type`,`region`,`date`)
) ENGINE=InnoDB AUTO_INCREMENT=83126347 DEFAULT CHARSET=utf8 COMMENT='Logs table' |
And about the date search #SashiKant and #SalmanA already answered.
Is Mysql you should place index columns by collation count; less possible values in table - placed closer to the left.
Also you can try to change column region to enum() and try to search date with BETWEEN clause.
Mysql is not using third column in the index because it's usage takes more efforts then just filtering (it's a common thing in Mysql).

Joining tables on foreign key: MySQL performance

I have two tables that have a foreign key constraint between them
Table event
mysql> describe event;
+------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+-------+
| sid | int(10) unsigned | NO | PRI | NULL | |
| cid | int(10) unsigned | NO | PRI | NULL | |
| signature | int(10) unsigned | NO | MUL | NULL | |
| timestamp | datetime | NO | MUL | NULL | |
| is_deleted | tinyint(1) | NO | MUL | 0 | |
+------------+------------------+------+-----+---------+-------+
5 rows in set (0.00 sec)
Table signature
mysql> describe signature;
+--------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+----------------+
| sig_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| sig_name | varchar(255) | NO | MUL | NULL | |
| sig_class_id | int(10) unsigned | NO | MUL | NULL | |
| sig_priority | int(10) unsigned | YES | | NULL | |
| sig_rev | int(10) unsigned | YES | | NULL | |
| sig_sid | int(10) unsigned | YES | | NULL | |
| sig_gid | int(10) unsigned | YES | | NULL | |
+--------------+------------------+------+-----+---------+----------------+
7 rows in set (0.00 sec)
event.signature is a foreign key and links to signature.sig_id. Both have indexes as well
Table event is large(say 1M records) while table signature will be comparatively small (Few thousand at most)
Joined Queries that access any signature attribute take a very long time to execute. A look at explain
mysql> explain select event.sid,event.cid,signature.sig_name from event join signature on signature.sig_id=event.signature;
+----+-------------+-----------+------+--------------------------------+-----------------------+---------+-------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+--------------------------------+-----------------------+---------+-------------------------+------+--------------------------+
| 1 | SIMPLE | signature | ALL | PRIMARY,index_signature_sig_id | NULL | NULL | NULL | 127 | |
| 1 | SIMPLE | event | ref | index_event_signature | index_event_signature | 5 | snorby.signature.sig_id | 68 | Using where; Using index |
+----+-------------+-----------+------+--------------------------------+-----------------------+---------+-------------------------+------+--------------------------+
2 rows in set (0.00 sec)
While if no signature attribute is accessed
mysql> explain select event.sid,event.cid from event join signature on signature.sig_id=event.signature;
+----+-------------+-----------+-------+--------------------------------+------------------------+---------+-------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+--------------------------------+------------------------+---------+-------------------------+------+--------------------------+
| 1 | SIMPLE | signature | index | PRIMARY,index_signature_sig_id | index_signature_sig_id | 4 | NULL | 127 | Using index |
| 1 | SIMPLE | event | ref | index_event_signature | index_event_signature | 5 | snorby.signature.sig_id | 68 | Using where; Using index |
+----+-------------+-----------+-------+--------------------------------+------------------------+---------+-------------------------+------+--------------------------+
2 rows in set (0.00 sec)
As can be seen if Signature attribute is queried it does a full scan with ALL join type.
Is it possible to rewrite the query to be faster? I ask this because this is a part of multiple table join and joining event with signature is the bottleneck that is slowing down the query tremendously
I am using 5.1.52 MySQL and SQLAlchemy 0.7.8 as ORM
Your query does require a full scan, by definition.
That is, you give no filtering condition. No ... WHERE sig_rev = 17, for example.
Therefore, there is not much to improve here. MySQL picks a table to start with, does a full scan, and per row fetches matching rows from second table.
So the scan is essential. But you may turn it into an index scan instead of a table scan. I am assuming you have an index on the signature column only, and on the sig_id column only.
What you may do is create an additional index on sig_id, sig_name, like this:
ALTER TABLE signature ADD UNIQUE INDEX(sig_id, sig_name);
The index is unique by definition, since it is broader than the PRIMARY KEY, but this is outside the point.
What you may gain now is an execution plan similar to the second example you have posted: an index scan on signature, followed by an index lookup on event.
Make sure to compare and verify that you do get performance boost on this particular query. Check that the new index does not hurt INSERT performance etc.
Good luck.

Optimizing query in the MySQL slow-query log

Our database is set up so that we have a credentials table that hold multiple different types of credentials (logins and the like). There's also a credential_pairs table that associates some of these types together (for instance, a user may have a password and security token).
In an attempt to see if a pair match, there is the following query:
SELECT DISTINCT cp.credential_id FROM credential_pairs AS cp
INNER JOIN credentials AS c1 ON (cp.primary_credential_id = c1.credential_id)
INNER JOIN credentials AS c2 ON (cp.secondary_credential_id = c2.credential_id)
WHERE c1.data = AES_ENCRYPT('Some Value 1', 'encryption key')
AND c2.data = AES_ENCRYPT('Some Value 2', 'encryption key');
This query works fine and gives us exactly what we need. HOWEVER, it is constantly showing in the slow query log (possibly due to lack of indexes?). When I ask MySQL to "explain" the query it gives me:
+----+-------------+-------+------+--------------------------------------------------------+---------------------+---------+-------+-------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+--------------------------------------------------------+---------------------+---------+-------+-------+--------------------------------+
| 1 | SIMPLE | c1 | ref | credential_id_UNIQUE,credential_id,ix_credentials_data | ix_credentials_data | 22 | const | 1 | Using where; Using temporary |
| 1 | SIMPLE | c2 | ref | credential_id_UNIQUE,credential_id,ix_credentials_data | ix_credentials_data | 22 | const | 1 | Using where |
| 1 | SIMPLE | cp | ALL | NULL | NULL | NULL | NULL | 69197 | Using where; Using join buffer |
+----+-------------+-------+------+--------------------------------------------------------+---------------------+---------+-------+-------+--------------------------------+
I have a feeling that last entry (where it shows 69197 rows) is probably the problem, but I am FAR from a DBA... help?
credentials table:
CREATE TABLE `credentials` (
`hidden_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`credential_id` varchar(255) NOT NULL,
`data` blob NOT NULL,
`credential_status` varchar(100) NOT NULL,
`insert_date` datetime NOT NULL,
`insert_user` int(10) unsigned NOT NULL,
`update_date` datetime DEFAULT NULL,
`update_user` int(10) unsigned DEFAULT NULL,
`delete_date` datetime DEFAULT NULL,
`delete_user` int(10) unsigned DEFAULT NULL,
`is_deleted` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`hidden_id`,`credential_id`),
UNIQUE KEY `credential_id_UNIQUE` (`credential_id`),
KEY `credential_id` (`credential_id`),
KEY `data` (`data`(10)),
KEY `credential_status` (`credential_status`(10))
) ENGINE=InnoDB AUTO_INCREMENT=1572 DEFAULT CHARSET=utf8;
credential_pairs Table:
CREATE TABLE `credential_pairs` (
`hidden_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`credential_id` varchar(255) NOT NULL,
`primary_credential_id` varchar(255) NOT NULL,
`secondary_credential_id` varchar(255) NOT NULL,
`is_deleted` tinyint(1) DEFAULT NULL,
PRIMARY KEY (`hidden_id`,`credential_id`),
KEY `primary_credential_id` (`primary_credential_id`(10)),
KEY `secondary_credential_id` (`secondary_credential_id`(10))
) ENGINE=InnoDB AUTO_INCREMENT=500 DEFAULT CHARSET=latin1;
credentials Indexes:
+-------------+------------+----------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------------+------------+----------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
| credentials | 0 | PRIMARY | 1 | hidden_id | A | 186235 | NULL | NULL | | BTREE | |
| credentials | 0 | PRIMARY | 2 | credential_id | A | 186235 | NULL | NULL | | BTREE | |
| credentials | 0 | credential_id_UNIQUE | 1 | credential_id | A | 186235 | NULL | NULL | | BTREE | |
| credentials | 1 | credential_id | 1 | credential_id | A | 186235 | NULL | NULL | | BTREE | |
| credentials | 1 | ix_credentials_data | 1 | data | A | 186235 | 20 | NULL | | BTREE | |
+-------------+------------+----------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
credential_pair Indexes:
+------------------+------------+---------------------------------------------+--------------+-------------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+------------------+------------+---------------------------------------------+--------------+-------------------------+-----------+-------------+----------+--------+------+------------+---------+
| credential_pairs | 0 | PRIMARY | 1 | hidden_id | A | 69224 | NULL | NULL | | BTREE | |
| credential_pairs | 0 | PRIMARY | 2 | credential_id | A | 69224 | NULL | NULL | | BTREE | |
| credential_pairs | 1 | ix_credential_pairs_credential_id | 1 | credential_id | A | 69224 | 36 | NULL | | BTREE | |
| credential_pairs | 1 | ix_credential_pairs_primary_credential_id | 1 | primary_credential_id | A | 69224 | 36 | NULL | | BTREE | |
| credential_pairs | 1 | ix_credential_pairs_secondary_credential_id | 1 | secondary_credential_id | A | 69224 | 36 | NULL | | BTREE | |
+------------------+------------+---------------------------------------------+--------------+-------------------------+-----------+-------------+----------+--------+------+------------+---------+
UPDATE NOTES:
AFAICT: The DISTINCT was superfluous... nothing really needed it, so I dropped it. In an attempt to follow Fabrizio's advice to get a where on the credential_pairs lookup I then altered the statement to read as:
SELECT credential_id
FROM credential_pairs cp
WHERE cp.primary_credential_id = (SELECT credential_id FROM credentials WHERE data = AES_ENCRYPT('value 1','enc_key')) AND
cp.secondary_credential_id = (SELECT credential_id FROM credentials WHERE data = AES_ENCRYPT('value 2','enc_key'))
And.... nothing. The statement takes just as long and the explain looks pretty much the same. So, I added an index to the primary and secondary columns with:
ALTER TABLE credential_pairs ADD INDEX `idx_credential_pairs__primary_and_secondary`(`primary_credential_id`, `secondary_credential_id`);
And... nothing.
+----+-------------+-------------+-------+---------------------+---------------------------------------------+---------+------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+---------------------+---------------------------------------------+---------+------+-------+--------------------------+
| 1 | PRIMARY | cp | index | NULL | idx_credential_pairs__primary_and_secondary | 514 | NULL | 69217 | Using where; Using index |
| 3 | SUBQUERY | credentials | ref | ix_credentials_data | ix_credentials_data | 22 | | 1 | Using where |
| 2 | SUBQUERY | credentials | ref | ix_credentials_data | ix_credentials_data | 22 | | 1 | Using where |
+----+-------------+-------------+-------+---------------------+---------------------------------------------+---------+------+-------+--------------------------+
It says it's using the index, but it still looks like it's table scanning. So, I added a joint key (as per a'r's comment below) with:
ALTER TABLE credential_pairs ADD KEY (primary_credential_id, secondary_credential_id);
And... same result as with the index (are these functionally the same?).
The DISTINCT is what is generating the "Use temporary", you usually want to avoid those when possible
Plus you are scanning the whole credential_pair table as you do not have any conditions against it so no indexes are used and the whole table is returned before applying the WHERE
hope it makes sense
EDIT/ADD
Try by starting from a different table, if I understand correctly, you have Table A, a Table B and a Table AB and you are starting the select from AB, try to start it from A
I haven't tested this, but you could try:
SELECT cp.credential_id
FROM credentials AS c1
LEFT JOIN credential_pairs AS cp ON (c1.credential_id = cp.primary_credential_id)
LEFT JOIN credentials AS c2 ON (cp.secondary_credential_id = c2.credential_id)
WHERE
c1.data = AES_ENCRYPT('Some Value 1', 'encryption key')
AND c2.data = AES_ENCRYPT('Some Value 2', 'encryption key');
I have had luck in the past by moving select tables around