I have a table of over 9 million rows. I have a SELECT query that I'm using an index for. Here is the query:
SELECT `username`,`id`
FROM `04c1Tg0M`
WHERE `id` > 9259466
AND `tried` = 0
LIMIT 1;
That query executes very fast (0.00 sec). Here is the explain for that query:
+----+-------------+----------+-------+-----------------+---------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+-----------------+---------+---------+------+-------+-------------+
| 1 | SIMPLE | 04c1Tg0M | range | PRIMARY,triedex | PRIMARY | 4 | NULL | 10822 | Using where |
+----+-------------+----------+-------+-----------------+---------+---------+------+-------+-------------+
Now here is the same query except that I'm going to change the id to 6259466:
SELECT `username`,`id`
FROM `04c1Tg0M`
WHERE `id` > 5986551
AND `tried` = 0
LIMIT 1;
That query took 4.78 seconds to complete. This is the problem. Here is the explain for that query:
+----+-------------+----------+------+-----------------+---------+---------+-------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+-----------------+---------+---------+-------+---------+-------------+
| 1 | SIMPLE | 04c1Tg0M | ref | PRIMARY,triedex | triedex | 2 | const | 9275107 | Using where |
+----+-------------+----------+------+-----------------+---------+---------+-------+---------+-------------+
What is happening here and how can I fix it? Here are my indexes:
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| 04c1Tg0M | 0 | PRIMARY | 1 | id | A | 9275093 | NULL | NULL | | BTREE | |
| 04c1Tg0M | 1 | pdex | 1 | username | A | 9275093 | NULL | NULL | | BTREE | |
| 04c1Tg0M | 1 | pdex | 2 | id | A | 9275093 | NULL | NULL | | BTREE | |
| 04c1Tg0M | 1 | pdex | 3 | tried | A | 9275093 | NULL | NULL | YES | BTREE | |
| 04c1Tg0M | 1 | triedex | 1 | tried | A | 0 | NULL | NULL | YES | BTREE | |
| 04c1Tg0M | 1 | triedex | 2 | id | A | 9275093 | NULL | NULL | | BTREE | |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
And here is my table structure:
| 04c1Tg0M | CREATE TABLE `04c1Tg0M` (
`id` int(20) NOT NULL AUTO_INCREMENT,
`username` varchar(50) NOT NULL,
`tried` tinyint(1) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `pdex` (`username`,`id`,`tried`),
KEY `triedex` (`tried`,`id`)
) ENGINE=MyISAM AUTO_INCREMENT=9275108 DEFAULT CHARSET=utf8 |
The first SQL returns 10822 rows, while the second one returns 9275107 rows!
The use of primary key "id" index in the second query isn't so useful because you have to do a full table scan anyway.
MySQL's cost-based optimizer thinks, in the case of the 2nd query, it's better off to use the index on 'tried'.
If you have to do a full table-scan, you're better off not using an index, as index constitutes additional disk reads.
You can use "use index" or "force index" in your query to hint to the optimizer whether to use an index.
Also update the statistics by analyzing your table periodically so the cost-based optimizer is working correctly.
Related
Last week I installed some additional database monitoring and have since come to discover that a full 30% of our database load is spent on a single query on a single table (which currently has some 6 million rows):
delete FROM mdl_grade_items_history WHERE timemodified < ?
In a testing environment, I tried to make some schema changes:
Running EXPLAIN on this query indicates that every time this query is run, a full table scan is done.
EXPLAIN DELETE FROM mdl_grade_items_history WHERE timemodified < '1490528405';
+----+-------------+-------------------------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------------------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| 1 | DELETE | mdl_grade_items_history | NULL | ALL | NULL | NULL | NULL | NULL | 140784 | 100.00 | Using where |
+----+-------------+-------------------------+------------+------+---------------+------+---------+------+--------+----------+-------------+
1 row in set (0.00 sec)
Checking EXPLAIN for a (very similar) SELECT query shows a similar situation.
EXPLAIN SELECT id FROM mdl_grade_items_history WHERE timemodified < '1490528405';
+----+-------------+-------------------------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------------------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| 1 | SIMPLE | mdl_grade_items_history | NULL | ALL | NULL | NULL | NULL | NULL | 140784 | 33.33 | Using where |
+----+-------------+-------------------------+------------+------+---------------+------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.01 sec)
Checking the table definition, there does not seem to be an index on timemodified
SHOW INDEX FROM mdl_grade_items_history;
+-------------------------+------------+-------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------------------+------------+-------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| mdl_grade_items_history | 0 | PRIMARY | 1 | id | A | 140784 | NULL | NULL | | BTREE | | |
| mdl_grade_items_history | 1 | mdl_graditemhist_act_ix | 1 | action | A | 2 | NULL | NULL | | BTREE | | |
| mdl_grade_items_history | 1 | mdl_graditemhist_old_ix | 1 | oldid | A | 17170 | NULL | NULL | | BTREE | | |
| mdl_grade_items_history | 1 | mdl_graditemhist_cou_ix | 1 | courseid | A | 1065 | NULL | NULL | YES | BTREE | | |
| mdl_grade_items_history | 1 | mdl_graditemhist_cat_ix | 1 | categoryid | A | 2300 | NULL | NULL | YES | BTREE | | |
| mdl_grade_items_history | 1 | mdl_graditemhist_sca_ix | 1 | scaleid | A | 6 | NULL | NULL | YES | BTREE | | |
| mdl_grade_items_history | 1 | mdl_graditemhist_out_ix | 1 | outcomeid | A | 1 | NULL | NULL | YES | BTREE | | |
| mdl_grade_items_history | 1 | mdl_graditemhist_log_ix | 1 | loggeduser | A | 30 | NULL | NULL | YES | BTREE | | |
+-------------------------+------------+-------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
8 rows in set (0.00 sec)
So I tried to add one (both via CREATE INDEX and ALTER TABLE .. ADD INDEX)
CREATE INDEX `mdl_gradeitemhist_tim_ix` ON `mdl_grade_items_history` (`timemodified`);
ALTER TABLE `mdl_grade_items_history` ADD INDEX `mdl_gradeitemhist_tim_ix` (`timemodified`);
In both instances, the SELECT query was affected (note the change in type)
EXPLAIN `SELECT` id FROM mdl_grade_items_history WHERE timemodified < '1490528405';
+----+-------------+-------------------------+------------+-------+--------------------------+--------------------------+---------+------+-------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------------------+------------+-------+--------------------------+--------------------------+---------+------+-------+----------+--------------------------+
| 1 | SIMPLE | mdl_grade_items_history | NULL | range | mdl_gradeitemhist_tim_ix | mdl_gradeitemhist_tim_ix | 9 | NULL | 70206 | 100.00 | Using where; Using index |
+----+-------------+-------------------------+------------+-------+--------------------------+--------------------------+---------+------+-------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
But not the DELETE query.
EXPLAIN DELETE FROM mdl_grade_items_history WHERE timemodified < '1490528405';
+----+-------------+-------------------------+------------+------+--------------------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------------------+------------+------+--------------------------+------+---------+------+--------+----------+-------------+
| 1 | DELETE | mdl_grade_items_history | NULL | ALL | mdl_gradeitemhist_tim_ix | NULL | NULL | NULL | 140412 | 100.00 | Using where |
+----+-------------+-------------------------+------------+------+--------------------------+------+---------+------+--------+----------+-------------+
1 row in set (0.00 sec)
What have I done wrong? What else could I try?
Low cardinality indexes (action, scaleid, outcomeid) are almost never used. Get rid of them.
Having a large number of single-column indexes is a red flag. Please learn about the power and benefit of "composite" indexes. (Not relevent for the select/delete mentioned here, but probably relevant for other queries.)
Extra indexes on a table slightly slow down INSERTs and DELETEs since the indexes need to (eventually) be updated.
Extra indexes slow down UPDATEs if an indexed column is modified.
CREATE INDEX and ALTER TABLE ADD INDEX do the same thing; you probably have a redundant index now.
The EXPLAINs are different because (1) SELECT and DELETE do different things and (2) EXPLAIN is not very sophisticated.
Deleting a large number of rows takes a lot of effort -- Keep in mind that the deleted rows are hung onto in case of a ROLLBACK. Only after COMMIT can the rows really be removed. (With autocommit=ON, there is an implicit COMMIT.)
Tips on large deletes:
Deleting in chunks
Using PARTITIONs for very efficient deletion of time series
I have two tables CUSTOMER_ORDER_PUBLIC and LINEITEM_PUBLIC which have the following indices:
+-----------------------+------------+---------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------------+------------+---------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| CUSTOMER_ORDER_PUBLIC | 1 | O_ORDERKEY | 1 | O_ORDERKEY | A | 2633457 | NULL | NULL | YES | BTREE | | |
| CUSTOMER_ORDER_PUBLIC | 1 | O_ORDERDATE | 1 | O_ORDERDATE | A | 2350 | NULL | NULL | YES | BTREE | | |
| CUSTOMER_ORDER_PUBLIC | 1 | PUB_C_CUSTKEY | 1 | PUB_C_CUSTKEY | A | 273000 | NULL | NULL | | BTREE | | |
+-----------------------+------------+---------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
and:
+-----------------+------------+----------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+----------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| LINEITEM_PUBLIC | 0 | PRIMARY | 1 | PUB_L_ORDERKEY | A | 16488602 | NULL | NULL | | BTREE | | |
| LINEITEM_PUBLIC | 0 | PRIMARY | 2 | PUB_L_LINENUMBER | A | 44146904 | NULL | NULL | | BTREE | | |
| LINEITEM_PUBLIC | 1 | LINEITEM_PRIVATE_FK2 | 1 | PUB_L_PARTKEY | A | 2083757 | NULL | NULL | | BTREE | | |
| LINEITEM_PUBLIC | 1 | LINEITEM_PRIVATE_FK3 | 1 | PUB_L_SUPPKEY | A | 85599 | NULL | NULL | | BTREE | | |
+-----------------+------------+----------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Each time I run an Explain of a specific query I get the following:
mysql> EXPLAIN SELECT *
FROM CUSTOMER_ORDER_PUBLIC
LEFT OUTER JOIN LINEITEM_PUBLIC ON O_ORDERKEY= PUB_L_ORDERKEY;
+----+-------------+-----------------------+------------+------+---------------+---------+---------+---------------------------------------+---------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------+------------+------+---------------+---------+---------+---------------------------------------+---------+----------+-------+
| 1 | SIMPLE | CUSTOMER_ORDER_PUBLIC | NULL | ALL | NULL | NULL | NULL | NULL | 2900769 | 100.00 | NULL |
| 1 | SIMPLE | LINEITEM_PUBLIC | NULL | ref | PRIMARY | PRIMARY | 4 | TPCH.CUSTOMER_ORDER_PUBLIC.O_ORDERKEY | 2 | 100.00 | NULL |
+----+-------------+-----------------------+------------+------+---------------+---------+---------+---------------------------------------+---------+----------+-------+
For some reason the query optimizer is not using the index (O_ORDERKEY) even if I use a FORCE INDEX. I know a lot of people posted similar questions but I tried everything and nothing seems to help!
Any other suggestions would be greatly appreciated!
Edit:
The query used is the following:
SELECT * FROM CUSTOMER_ORDER_PUBLIC
LEFT OUTER JOIN LINEITEM_PUBLIC ON O_ORDERKEY= PUB_L_ORDERKEY;
For this query:
SELECT *
FROM CUSTOMER_ORDER_PUBLIC cop LEFT OUTER JOIN
LINEITEM_PUBLIC lp
ON cop.O_ORDERKEY = lp.PUB_L_ORDERKEY;
For this query, you want an index on LINEITEM_PUBLIC(PUB_L_ORDERKEY). Of course, you already have this index because this is the first key in the primary key.
There is no reason to use an index on CUSTOMER_ORDER_PUBLIC, because all rows in the table are going to the result set.
The FORCE INDEX hint tells the optimizer that a full scan of the table is very expensive.
The most likely explanation for the observed behavior is that the optimizer thinks it needs to access every row in the table, and the index suggested in the hint is not a covering index for the query.
Based on the EXPLAIN output, we only see evidence of a single predicate on the JOIN operation. And it looks like the optimizer is choosing CUSTOMER_ORDER_PUBLIC as the driving table for the join, and using an index on the LINEITEM_PUBLIC table.
I'm not sure any of that answers the question you asked. (I'm not sure that there was a question asked.) Absent an actual SQL statement, we are just making guesses.
I have a question: Aside from the FORCE INDEX hint, why would we expect the optimizer to use a particular index? And why would that be a reasonable expectation?
I have a MySQL table with some 20 million rows of data in it.
+-------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| b_id | int(11) | YES | MUL | NULL | |
| order | bigint(20) | YES | MUL | NULL | |
| date | date | YES | | NULL | |
| time | time | YES | | NULL | |
| channel | varchar(8) | YES | MUL | NULL | |
| data | varchar(60) | YES | | NULL | |
| date_system | date | YES | MUL | NULL | |
| time_system | time | YES | | NULL | |
+-------------+-------------+------+-----+---------+----------------+
I had an non unique index on (b_id, channel, date) to speed up queries like:
select date, left(time,2) as hour, round(data,1) as data
from data_lines
where channel='1'
and b_id='300'
and date >='2013-04-19'
and date <='2013-04-26'
group by date,hour
The problem was that my inserts sometimes overlap, so I wanted to use 'ON DUPLICATE KEY UPDATE', however this needs a unique index. So I create a unique index on (b_id, channel, date, time) as these are the four main characteristics to determine if there is a double value. The inserts now work fine, however my select queries are unacceptable slow.
I'm not quite sure why my selects have become slower since the addition of the new index:
is time so unique that the index becomes very large --> and slow?
should I remove the non unique index to speed things up?
is it my bad querying?
other ideas welcome!
For the record (order, date_system and time_system) are not used at all in indexes or selects, but do contain data. The inserts are run from C and Python and the selects from PHP.
Per request the explain query:
mysql> explain select date, left(time,2) as hour, round(data,1) as data
from data_lines
where channel='1'
and b_id='300'
and date >='2013-04-19'
and date <='2013-04-26'
group by date,hour;
+----+-------------+-----------+------+--------------------------------+------------+---------+-------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+--------------------------------+------------+---------+-------------+------+----------------------------------------------+
| 1 | SIMPLE | data_lines| ref | update_index,b_id,comp_index | comp_index | 16 | const,const | 3548 | Using where; Using temporary; Using filesort |
+----+-------------+-----------+------+--------------------------------+------------+---------+-------------+------+----------------------------------------------+
The update_index is my unique index of (b_id, channel, date, time) and the comp_index is my non unique index of (b_id, channel, date).
Indexes are:
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| data_lines| 0 | PRIMARY | 1 | id | A | 17918898 | NULL | NULL | | BTREE | | |
| data_lines| 0 | id_UNIQUE | 1 | id | A | 17918898 | NULL | NULL | | BTREE | | |
| data_lines| 0 | update_index | 1 | channel | A | 17 | NULL | NULL | YES | BTREE | | |
| data_lines| 0 | update_index | 2 | b_id | A | 17 | NULL | NULL | YES | BTREE | | |
| data_lines| 0 | update_index | 3 | date | A | 44244 | NULL | NULL | YES | BTREE | | |
| data_lines| 0 | update_index | 4 | time | A | 17918898 | NULL | NULL | YES | BTREE | | |
| data_lines| 1 | box_id | 1 | b_id | A | 17 | NULL | NULL | YES | BTREE | | |
| data_lines| 1 | idx | 1 | order | A | 17918898 | NULL | NULL | YES | BTREE | | |
| data_lines| 1 | comp_index | 1 | b_id | A | 17 | NULL | NULL | YES | BTREE | | |
| data_lines| 1 | comp_index | 2 | channel | A | 6624 | NULL | NULL | YES | BTREE | | |
| data_lines| 1 | comp_index | 3 | date | A | 165915 | NULL | NULL | YES | BTREE | | |
| data_lines| 1 | date_system | 1 | date_system | A | 17 | NULL | NULL | YES | BTREE | | |
| data_lines| 1 | mac | 1 | mac | A | 17 | NULL | NULL | YES | BTREE | | |
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Try explicitly specifying USE INDEX(update_index) in your query.
the optimizer is making wrong choice in selecting in selecting the index because of which the query is becoming slow.
Hope this solves your problem.. :)
Since a PRIMARY KEY is a UNIQUE KEY, get rid of the useless UNIQUE(id).
Are any of the columns we are talking about ever NULL? If not, make them NOT NULL. (This is important before upgrading the UNIQUE index.)
Unless you need it for some other query, DROP comp_index. It provides no extra benefit (toward your INSERT or SELECT) over the 4-column unique_index.
Do you use id anywhere else? If not, promote the 4-col unique index to be PRIMARY KEY. This step is likely to speed things up because now it is not bouncing back and forth between the index and the data (to get data).
That leaves 4 other indexes; see if you really need them. (I suggest this because a previous step will make secondary indexes bulkier.)
Change to InnoDB if you are using MyISAM.
When doing lots of ALTERs, do them in a single statement -- it will be a lot faster.
ALTER TABLE ...
DROP COLUMN id,
DROP PRIMARY KEY,
DROP INDEX `id_UNIQUE`,
DROP INDEX comp_index,
ADD PRIMARY KEY(channel, b_id, date, time),
ALTER COLUMN ... NOT NULL,
...
ENGINE=InnoDB;
Or, to be more cautious: CREATE the modified table, then INSERT...SELECT to populate it. Then test. Eventually do RENAME TABLE to put it into place.
It is usually a bad idea to split date and time into two columns instead of having a single datetime. But I won't push it, since it probably does not affect this Question much.
I'm working on modifying an old MySQL database which turned out to be designed improperly for the sort of data it was storing. I'm not very familiar with SQL at all, so I used SHOW CREATE TABLE to get the CREATE statement used for the old table ('interaction_old') and copied it almost exactly, with the only changes being a few of the column names and data types, to make a new table ('interaction_new'). Now some queries which were using indexes in the old table no longer use indexes in the new table, and I can't figure out why.
Here are the indexes from both tables:
mysql> SHOW KEYS FROM interaction_old;
+-----------------+------------+---------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------------+------------+---------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
| interaction_old | 0 | PRIMARY | 1 | interactionid | A | 138996006 | NULL | NULL | | BTREE | |
| interaction_old | 1 | Complex_pdbid | 1 | Complex_pdbid | A | 1338 | NULL | NULL | | BTREE | |
| interaction_old | 1 | Protein_id | 1 | Protein_id | A | 13737 | NULL | NULL | | BTREE | |
| interaction_old | 1 | RNA_id | 1 | RNA_id | A | 2806 | NULL | NULL | | BTREE | |
+-----------------+------------+---------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
mysql> SHOW KEYS FROM interaction_new;
+-----------------+------------+------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------------+------------+------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
| interaction_new | 0 | PRIMARY | 1 | interactionid | A | 152311144 | NULL | NULL | | BTREE | |
| interaction_new | 1 | pdbid | 1 | pdbid | A | 2924 | NULL | NULL | | BTREE | |
| interaction_new | 1 | pchainname | 1 | pchainname | A | 472 | NULL | NULL | | BTREE | |
| interaction_new | 1 | rchainname | 1 | rchainname | A | 487 | NULL | NULL | | BTREE | |
+-----------------+------------+------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
And an example query that is behaving differently between the two:
mysql> EXPLAIN SELECT DISTINCT Complex_pdbid FROM interaction_old;
+----+-------------+-----------------+-------+---------------+---------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+-------+---------------+---------------+---------+------+------+--------------------------+
| 1 | SIMPLE | interaction_old | range | NULL | Complex_pdbid | 6 | NULL | 1339 | Using index for group-by |
+----+-------------+-----------------+-------+---------------+---------------+---------+------+------+--------------------------+
mysql> EXPLAIN SELECT DISTINCT pdbid FROM interaction_new;
+----+-------------+-----------------+------+---------------+------+---------+------+-----------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+---------------+------+---------+------+-----------+-----------------+
| 1 | SIMPLE | interaction_new | ALL | NULL | NULL | NULL | NULL | 152311144 | Using temporary |
+----+-------------+-----------------+------+---------------+------+---------+------+-----------+-----------------+
As you might expect, the query on interaction_old finishes in a fraction of a second, whereas I've let the query on interaction_new run for ~20 minutes before killing it. interaction_old.Complex_pdbid and interaction_new.pdbid are the same data type (and are storing almost exactly the same data). USE INDEX and/or FORCE INDEX doesn't seem to have any effect. What's causing the different behavior?
Edit: According to the documentation, the first table uses a loose index scan to increase speed -- nothing from that page makes it clear to me why this doesn't work on the second table, though.
My MySQL is not strong, so please forgive any rookie mistakes. Short version:
SELECT locId,count,avg FROM destAgg_geo is significantly slower than SELECT * from destAgg_geo
prtt.destAgg is a table keyed on dst_ip (PRIMARY)
mysql> describe prtt.destAgg;
+---------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+-------+
| dst_ip | int(10) unsigned | NO | PRI | 0 | |
| total | float unsigned | YES | | NULL | |
| avg | float unsigned | YES | | NULL | |
| sqtotal | float unsigned | YES | | NULL | |
| sqavg | float unsigned | YES | | NULL | |
| count | int(10) unsigned | YES | | NULL | |
+---------+------------------+------+-----+---------+-------+
geoip.blocks is a table keyed on both startIpNum and endIpNum (PRIMARY)
mysql> describe geoip.blocks;
+------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+-------+
| startIpNum | int(10) unsigned | NO | MUL | NULL | |
| endIpNum | int(10) unsigned | NO | | NULL | |
| locId | int(10) unsigned | NO | | NULL | |
+------------+------------------+------+-----+---------+-------+
destAgg_geo is a view:
CREATE VIEW destAgg_geo AS SELECT * FROM destAgg JOIN geoip.blocks
ON destAgg.dst_ip BETWEEN geoip.blocks.startIpNum AND geoip.blocks.endIpNum;
Here's the optimization plan for select *:
mysql> explain select * from destAgg_geo;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | blocks | ALL | start_end | NULL | NULL | NULL | 3486646 | |
| 1 | SIMPLE | destAgg | ALL | PRIMARY | NULL | NULL | NULL | 101893 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
Here's the optimization plan for select with specific columns:
mysql> explain select locId,count,avg from destAgg_geo;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | destAgg | ALL | PRIMARY | NULL | NULL | NULL | 101893 | |
| 1 | SIMPLE | blocks | ALL | start_end | NULL | NULL | NULL | 3486646 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
Here's the optimization plan for every column from destAgg and just the locId column from geoip.blocks:
mysql> explain select dst_ip,total,avg,sqtotal,sqavg,count,locId from destAgg_geo;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | blocks | ALL | start_end | NULL | NULL | NULL | 3486646 | |
| 1 | SIMPLE | destAgg | ALL | PRIMARY | NULL | NULL | NULL | 101893 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
Remove any column except dst_ip and the range check flips to blocks:
mysql> explain select dst_ip,avg,sqtotal,sqavg,count,locId from destAgg_geo;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | destAgg | ALL | PRIMARY | NULL | NULL | NULL | 101893 | |
| 1 | SIMPLE | blocks | ALL | start_end | NULL | NULL | NULL | 3486646 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
which is then much slower. What's going on here?
(Yes, I could just use the * query results and process from there, but I would like to know what's happening and why)
EDIT -- EXPLAIN on the VIEW query:
mysql> explain SELECT * FROM destAgg JOIN geoip.blocks ON destAgg.dst_ip BETWEEN geoip.blocks.startIpNum AND geoip.blocks.endIpNum;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | blocks | ALL | start_end | NULL | NULL | NULL | 3486646 | |
| 1 | SIMPLE | destAgg | ALL | PRIMARY | NULL | NULL | NULL | 101893 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
MySQL can tell you if you run EXPLAIN PLAN on both queries.
The first query with the columns doesn't include any key columns, so my guess is it has to do a TABLE SCAN.
The second query with the "SELECT *" includes the primary key, so it can use the index.
The range filter is applied last, so the problem is that the query optimizer is choosing to join the larger table first in one case, and the smaller table first in another. Perhaps someone with more knowledge of the optimizer can tell us why it's joining the tables in a different order for each.
I think the real goal here should be to try to get the JOIN to use an index, so the order of the join wouldn't matter so much.
I would try putting a compisite index on locId,count,avg and see if that doesn't improve speed.