I need to find the best index for this query:
SELECT c.id id, type
FROM Content c USE INDEX (type_proc_last_cat)
LEFT JOIN Battles b ON c.id = b.id
WHERE type = 1
AND processing_status = 1
AND category IN (13, 19)
AND status = 4
ORDER BY last_change DESC
LIMIT 100";
The tables look like this:
mysql> describe Content;
+-------------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+---------------------+------+-----+---------+-------+
| id | bigint(20) unsigned | NO | PRI | NULL | |
| type | tinyint(3) unsigned | NO | MUL | NULL | |
| category | bigint(20) unsigned | NO | | NULL | |
| processing_status | tinyint(3) unsigned | NO | | NULL | |
| last_change | int(10) unsigned | NO | | NULL | |
+-------------------+---------------------+------+-----+---------+-------+
mysql> show indexes from Content;
+---------+------------+---------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+---------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
| Content | 0 | PRIMARY | 1 | id | A | 4115 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_last_cat | 1 | type | A | 4 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_last_cat | 2 | processing_status | A | 20 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_last_cat | 3 | last_change | A | 4115 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_last_cat | 4 | category | A | 4115 | NULL | NULL | | BTREE | |
+---------+------------+---------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
mysql> describe Battles;
+---------------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+---------------------+------+-----+---------+-------+
| id | bigint(20) unsigned | NO | PRI | NULL | |
| status | tinyint(4) unsigned | NO | | NULL | |
| status_last_changed | int(11) unsigned | NO | | NULL | |
+---------------------+---------------------+------+-----+---------+-------+
mysql> show indexes from Battles;
+---------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Battles | 0 | PRIMARY | 1 | id | A | 1215 | NULL | NULL | | BTREE | |
| Battles | 0 | id_status | 1 | id | A | 1215 | NULL | NULL | | BTREE | |
| Battles | 0 | id_status | 2 | status | A | 1215 | NULL | NULL | | BTREE | |
+---------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
And I get output like this:
mysql> explain
-> SELECT c.id id, type
-> FROM Content c USE INDEX (type_proc_last_cat)
-> LEFT JOIN Battles b USE INDEX (id_status) ON c.id = b.id
-> WHERE type = 1
-> AND processing_status = 1
-> AND category IN (13, 19)
-> AND status = 4
-> ORDER BY last_change DESC
-> LIMIT 100;
+----+-------------+-------+--------+--------------------+--------------------+---------+-----------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------+--------------------+---------+-----------------------+------+--------------------------+
| 1 | SIMPLE | c | ref | type_proc_last_cat | type_proc_last_cat | 2 | const,const | 1352 | Using where; Using index |
| 1 | SIMPLE | b | eq_ref | id_status | id_status | 9 | wtm_master.c.id,const | 1 | Using where; Using index |
+----+-------------+-------+--------+--------------------+--------------------+---------+-----------------------+------+--------------------------+
The trouble is the rows count for the Content table. It appears MySQL is unable to effectively use both last_change and category in the type_proc_last_cat index. If I switch the order of last_change and category, fewer rows are selected but it results in a filesort for the ORDER BY, meaning it pulls all the matching rows from the database. That is worse, since there are 100,000+ rows in both tables.
Tables are both InnoDB, so keep in mind that the PRIMARY key is appended to every other index. So the index the index type_proc_last_cat above behaves likes it's on (type, processing_status, last_change, category, id). I am aware I could change the PRIMARY key for Battles to (id, status) and drop the id_status index (and I may just do that).
Edit: Any value for type, category, processing_status, and status is less than 20% of the total values. last_change and status_last_change are unix timestamps.
Edit: If I use a different index with category and last_change in reverse order, I get this:
mysql> show indexes from Content;
+---------+------------+---------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+---------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
| Content | 0 | PRIMARY | 1 | id | A | 4115 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_cat_last | 1 | type | A | 6 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_cat_last | 2 | processing_status | A | 26 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_cat_last | 3 | category | A | 228 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_cat_last | 4 | last_change | A | 4115 | NULL | NULL | | BTREE | |
+---------+------------+---------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
mysql> explain SELECT c.id id, type FROM Content c USE INDEX (type_proc_cat_last) LEFT JOIN Battles b
USE INDEX (id_status) ON c.id = b.id WHERE type = 1 AND processing_status = 1 AND category IN (13, 19) AND status = 4 ORDER BY last_change DESC LIMIT 100;
+----+-------------+-------+-------+--------------------+--------------------+---------+-----------------------+------+------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+--------------------+--------------------+---------+-----------------------+------+------------------------------------------+
| 1 | SIMPLE | c | range | type_proc_cat_last | type_proc_cat_last | 10 | NULL | 165 | Using where; Using index; Using filesort |
| 1 | SIMPLE | b | ref | id_status | id_status | 9 | wtm_master.c.id,const | 1 | Using where; Using index |
+----+-------------+-------+-------+--------------------+--------------------+---------+-----------------------+------+------------------------------------------+
The filesort worries me as it tells me MySQL pulls all the matching rows first, before sorting. This will be a big problem when there are 100,000+.
rows field in EXPLAIN doesn't reflect the number of rows that actually were read. It reflects the number of rows that possible will be affected. Also it doesn't rely on LIMIT, because LIMIT is applied after the plan has been calculated.
So you don't need to worry about it.
Also I would suggest you to swap last_change and category in type_proc_last_cat so mysql can try to use the last index part (last_change) for sorting purposes.
Related
I have a query that joins two tables and orders the data on the primary key. This is resulting in the very popular problem of MySQL "Using index; Using temporary; Using filesort."
The issue is causing a severe latency problem in my production tables with about 400k records.
Here's more info:
I have two tables: Doctor and Area. The Doctor table has a foreign key pointing to Area.
Doctor:
+-----------------------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------------+---------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| area_id | int(11) | NO | MUL | NULL | |
+-----------------------------+---------------+------+-----+---------+----------------+
Doctor indexes:
+---------------+------------+------------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------+------------+------------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| doctor | 0 | PRIMARY | 1 | id | A | 5546 | NULL | NULL | | BTREE | | |
| doctor | 1 | doctor_dfd0e917 | 1 | area_id | A | 29 | NULL | NULL | | BTREE | | |
+---------------+------------+------------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Area:
+------------------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
+------------------------+-------------+------+-----+---------+----------------+
And the Area indexes:
+---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| area | 0 | PRIMARY | 1 | id | A | 24 | NULL | NULL | | BTREE | | |
+---------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
I'm trying to run the following query:
SELECT `doctor`.`id`,
`area`.`id`
FROM
`doctor`
INNER JOIN
`area` ON (`doctor`.`area_id` = `area`.`id`)
ORDER BY
`doctor`.`id` DESC LIMIT 100;
The EXPLAIN returns the following (with the problematic Using index; Using temporary; Using filesort):
+----+-------------+---------------+-------+------------------------+------------------------+---------+--------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+------------------------+------------------------+---------+--------------+------+----------------------------------------------+
| 1 | SIMPLE | area | index | PRIMARY | PRIMARY | 4 | NULL | 24 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | doctor | ref | doctor_dfd0e917 | doctor_dfd0e917 | 4 | area.id | 191 | Using index |
+----+-------------+---------------+-------+------------------------+------------------------+---------+--------------+------+----------------------------------------------+
If I remove the ORDER BY clause, I get the desired effect:
+----+-------------+---------------+-------+------------------------+------------------------+---------+--------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+------------------------+------------------------+---------+--------------+------+----------------------------------------------+
| 1 | SIMPLE | area | index | PRIMARY | PRIMARY | 4 | NULL | 24 | Using index |
| 1 | SIMPLE | doctor | ref | doctor_dfd0e917 | doctor_dfd0e917 | 4 | area.id | 191 | Using index |
+----+-------------+---------------+-------+------------------------+------------------------+---------+--------------+------+----------------------------------------------+
Why is the ORDER BY clause causing problems here even though I'm using the primary key?
Thank you in advance.
It seems that you only have one area per doctor. See how this query works:
SELECT d.id,
(SELECT a.id FROM area a ON a.id = d.area_id) as area_id
FROM doctor d
ORDER BY d.id DESC
LIMIT 100;
If you are using inner join to test for the presence of a doctor in the table, then add:
SELECT d.id,
(SELECT a.id FROM area a ON a.id = d.area_id) as area_id
FROM doctor d
WHERE EXISTS (SELECT 1 FROM area a ON a.id = d.area_id)
ORDER BY d.id DESC
LIMIT 100;
There is a good chance that both of these will scan the doctors table in order, picking up the information from area as needed.
Here is the first table 'tbl1':
+---------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------------------+------+-----+---------+----------------+
| val | varchar(45) | YES | MUL | NULL | |
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
+---------+---------------------+------+-----+---------+----------------+
With its indexes:
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| tbl1 | 0 | PRIMARY | 1 | id | A | 201826018 | NULL | NULL | | BTREE | |
| tbl1 | 1 | val | 1 | val | A | 2147085 | NULL | NULL | YES | BTREE | |
| tbl1 | 1 | id_val | 1 | id | A | 201826018 | NULL | NULL | | BTREE | |
| tbl1 | 1 | id_val | 2 | val | A | 201826018 | NULL | NULL | YES | BTREE | |
| tbl1 | 1 | val_id | 1 | val | A | 2147085 | NULL | NULL | YES | BTREE | |
| tbl1 | 1 | val_id | 2 | id | A | 201826018 | NULL | NULL | | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
(The reason for some extra indexing is this: http://bit.ly/KWx1Xz.)
The second table is just about the same. Here are its index cardinalities though:
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| tbl2 | 0 | PRIMARY | 1 | id | A | 201826018 | NULL | NULL | | BTREE | |
| tbl2 | 1 | val | 1 | val | A | 881336 | NULL | NULL | YES | BTREE | |
| tbl2 | 1 | id_val | 1 | id | A | 201826018 | NULL | NULL | | BTREE | |
| tbl2 | 1 | id_val | 2 | val | A | 201826018 | NULL | NULL | YES | BTREE | |
| tbl2 | 1 | val_id | 1 | val | A | 881336 | NULL | NULL | YES | BTREE | |
| tbl2 | 1 | val_id | 2 | id | A | 201826018 | NULL | NULL | | BTREE | |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
The task is to inner-join them on the val column and get the id's list (and to do it in 1 second).
Here is the 'join' approach:
SELECT tbl1.id FROM tbl1 JOIN tbl2 ON tbl1.val = 'iii' AND tbl2.val = 'iii' AND tbl1.id = tbl2.id;
Result: 10831 rows in set (55.15 sec)
Query explain:
+----+-------------+--------+--------+----------------------------------+---------+---------+---------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+--------+----------------------------------+---------+---------+---------------------------+------+--------------------------+
| 1 | SIMPLE | tbl1 | ref | PRIMARY,val,id_val,val_id | val_id | 138 | const | 5160 | Using where; Using index |
| 1 | SIMPLE | tbl2 | eq_ref | PRIMARY,val,id_val,val_id | PRIMARY | 8 | search_test.tbl1.id | 1 | Using where |
+----+-------------+--------+--------+----------------------------------+---------+---------+---------------------------+------+--------------------------+
And here is the 'in' approach:
SELECT id FROM tbl1 WHERE val = 'iii' and id IN (SELECT id FROM tbl2 WHERE val = 'iii');
Result: 10831 rows in set (1 min 10.15 sec)
Explain:
+----+--------------------+--------+-----------------+---------------------------------+---------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------+-----------------+---------------------------------+---------+---------+-------+------+--------------------------+
| 1 | PRIMARY | tbl1 | ref | val,val_id | val_id | 138 | const | 8553 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | tbl2 | unique_subquery | PRIMARY,val,id_val,val_id | PRIMARY | 8 | func | 1 | Using where |
+----+--------------------+--------+-----------------+---------------------------------+---------+---------+-------+------+--------------------------+
So, here is the question: how to tweak this query to let MySQL accomplish it in a second?
OK I have tested this on 30,000+ records per table and it runs pretty darn quick.
As it currently stands you're performing a join on two massive tables presently but if you scan for matches on 'val' on each table first that will reduce the size of your join sets substantially.
I originally posted this answer as a set of subqueries but I didn't realize that MySQL is painfully slow at nested subqueries since it executes from the outside in. However if you define the subqueries as views it runs them from the inside out.
So, first create the views.
CREATE VIEW tbl1_iii AS (
SELECT * FROM tbl1 WHERE val='iii'
);
CREATE VIEW tbl2_iii AS (
SELECT * FROM tbl2 WHERE val='iii'
);
Then run the query.
SELECT tbl1_iii.id from tbl1_iii,tbl2_iii
WHERE tbl1_iii.id = tbl2_iii.id;
Lightning.
SELECT tbl1.id FROM tbl1 JOIN tbl2 ON tbl1.id = tbl2.id and tbl1.val = tbl2.val
where tbl1.val = 'iii';
I am going to join two tables by using a single position in one table to the range (represented by two columns) in another table.
However, the performance is too slow, which is about 20 mins.
I have tried adding the index on the table or changing the query.
But the performance is still poor.
So, I am asking for optimization of the joining speed.
The following is the query to MySQL.
mysql> SELECT `inVar`.chrom, `inVar`.pos, `openChrom_K562`.score
-> FROM `inVar`
-> LEFT JOIN `openChrom_K562`
-> ON (
-> `inVar`.chrom=`openChrom_K562`.chrom AND
-> `inVar`.pos BETWEEN `openChrom_K562`.chromStart AND `openChrom_K562`.chromEnd
-> );
inVar and openChrom_K562 are the tables I used.
inVar stores the single position in each row.
openChrom_K562 stores the range information indicated by chromStart and chromEnd.
inVar contains 57902 rows and openChrom_K562 has 137373 rows respectively.
Fields on the tables.
mysql> DESCRIBE inVar;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| chrom | varchar(31) | NO | PRI | NULL | |
| pos | int(10) | NO | PRI | NULL | |
+-------+-------------+------+-----+---------+-------+
mysql> DESCRIBE openChrom_K562;
+------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+-------+
| chrom | varchar(31) | NO | MUL | NULL | |
| chromStart | int(10) | NO | MUL | NULL | |
| chromEnd | int(10) | NO | | NULL | |
| score | int(10) | NO | | NULL | |
+------------+-------------+------+-----+---------+-------+
Index built in the tables
mysql> SHOW INDEX FROM inVar;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| inVar | 0 | PRIMARY | 1 | chrom | A | NULL | NULL | NULL | | BTREE | |
| inVar | 0 | PRIMARY | 2 | pos | A | 57902 | NULL | NULL | | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
mysql> SHOW INDEX FROM openChrom_K562;
+----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| openChrom_K562 | 1 | start_end | 1 | chromStart | A | 137373 | NULL | NULL | | BTREE | |
| openChrom_K562 | 1 | start_end | 2 | chromEnd | A | 137373 | NULL | NULL | | BTREE | |
| openChrom_K562 | 1 | chrom_only | 1 | chrom | A | 22 | NULL | NULL | | BTREE | |
| openChrom_K562 | 1 | chrom_start | 1 | chrom | A | 22 | NULL | NULL | | BTREE | |
| openChrom_K562 | 1 | chrom_start | 2 | chromStart | A | 137373 | NULL | NULL | | BTREE | |
| openChrom_K562 | 1 | chrom_end | 1 | chrom | A | 22 | NULL | NULL | | BTREE | |
| openChrom_K562 | 1 | chrom_end | 2 | chromEnd | A | 137373 | NULL | NULL | | BTREE | |
+----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
Execution plan on MySQL
mysql> EXPLAIN SELECT `inVar`.chrom, `inVar`.pos, score FROM `inVar` LEFT JOIN `openChrom_K562` ON ( inVar.chrom=openChrom_K562.chrom AND `inVar`.pos BETWEEN chromStart AND chromEnd );
+----+-------------+----------------+-------+--------------------------------------------+------------+---------+-----------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+-------+--------------------------------------------+------------+---------+-----------------+-------+-------------+
| 1 | SIMPLE | inVar | index | NULL | PRIMARY | 37 | NULL | 57902 | Using index |
| 1 | SIMPLE | openChrom_K562 | ref | start_end,chrom_only,chrom_start,chrom_end | chrom_only | 33 | tmp.inVar.chrom | 5973 | |
+----+-------------+----------------+-------+--------------------------------------------+------------+---------+-----------------+-------+-------------+
It seems it only optimizes by looking chrom in two tables. Then do the brute-force comparing in the tables.
Is there any ways to do the further optimization like indexing on the position?
(It is my first time posting the question, sorry for the poor posting quality.)
chrom_only is likely to be a bad index selection for your join as you only have chrom 22 values.
If I have interpreted this right the query should be faster if using start_end
SELECT `inVar`.chrom, `inVar`.pos, `openChrom_K562`.score
FROM `inVar`
LEFT JOIN `openChrom_K562`
USE INDEX (`start_end`)
ON (
`inVar`.chrom=`openChrom_K562`.chrom AND
`inVar`.pos BETWEEN `openChrom_K562`.chromStart AND `openChrom_K562`.chromEnd
)
I have this query:
SELECT
s.last_spread, s.sd, s.mean, s.id
,c.id_ticker, c.coef
,t.ticker
,p.last, p.price
FROM (SELECT * FROM spreads WHERE spreads.id_check=1 LIMIT 100,500 ) as s
INNER JOIN coef as c
ON c.id_spread = s.id
INNER JOIN tickers AS t
ON t.id = c.id_ticker
LEFT JOIN (SELECT prices.id_ticker, MAX(prices.date) as last, prices.price FROM prices GROUP BY prices.id_ticker) AS p
ON p.id_ticker = t.id
These are the schemas of the tables:
mysql> desc spreads;
+-------------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| id_check | int(11) | YES | MUL | NULL | |
| sd | double | YES | | NULL | |
| mean | double | YES | | NULL | |
| last_spread | double | YES | | NULL | |
+-------------+---------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
mysql> desc coef;
+-----------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| id_spread | int(11) | YES | MUL | NULL | |
| id_ticker | int(11) | YES | | NULL | |
| coef | double | YES | | NULL | |
| side | double | YES | | NULL | |
+-----------+---------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
mysql> desc tickers;
+----------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| ticker | varchar(45) | NO | | NULL | |
| name | varchar(150) | NO | | NULL | |
| category | varchar(150) | NO | | NULL | |
| issuer | varchar(150) | NO | | NULL | |
+----------+------------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
mysql> desc prices;
+-----------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| id_ticker | int(10) unsigned | NO | MUL | NULL | |
| date | date | NO | | NULL | |
| price | double | NO | | NULL | |
+-----------+------------------+------+-----+---------+----------------+
4 rows in set (0.01 sec)
These are the indexes of the above tables;
mysql> show indexes from spreads;
+---------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| spreads | 0 | PRIMARY | 1 | id | A | 2299 | NULL | NULL | | BTREE | |
| spreads | 1 | check_idx | 1 | id_check | A | 1 | NULL | NULL | YES | BTREE | |
+---------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
2 rows in set (0.00 sec)
mysql> show indexes from coef;
+-------+------------+-------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+-------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| coef | 0 | PRIMARY | 1 | id | A | 9078 | NULL | NULL | | BTREE | |
| coef | 1 | spread_ticker_idx | 1 | id_spread | A | NULL | NULL | NULL | YES | BTREE | |
| coef | 1 | spread_ticker_idx | 2 | id_ticker | A | NULL | NULL | NULL | YES | BTREE | |
+-------+------------+-------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows in set (0.00 sec)
mysql> show indexes from tickers;
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| tickers | 0 | PRIMARY | 1 | id | A | 100 | NULL | NULL | | BTREE | |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
1 row in set (0.00 sec)
mysql> show indexes from prices;
+--------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| prices | 0 | PRIMARY | 1 | id | A | 19962 | NULL | NULL | | BTREE | |
| prices | 1 | id_ticker | 1 | id_ticker | A | 19962 | NULL | NULL | | BTREE | |
+--------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
2 rows in set (0.15 sec)
And this is the explain of the query:
+----+-------------+------------+--------+-------------------+-------------------+---------+---------------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+-------------------+-------------------+---------+---------------------------+--------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 500 | |
| 1 | PRIMARY | c | ref | spread_ticker_idx | spread_ticker_idx | 5 | s.id | 90 | Using where |
| 1 | PRIMARY | t | eq_ref | PRIMARY | PRIMARY | 4 | spreadtrading.c.id_ticker | 1 | Using where |
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 100 | |
| 3 | DERIVED | prices | index | NULL | id_ticker | 4 | NULL | 119774 | |
| 2 | DERIVED | spreads | ref | check_idx | check_idx | 5 | | 2298 | Using where |
+----+-------------+------------+--------+-------------------+-------------------+---------+---------------------------+--------+-------------+
6 rows in set (0.27 sec)
Could I optimize it?
Thank you!
EDIT:
I would like to know if the INDEXES and the table's structure are optimized for the query I posted above. The results that I get using this query are good, it works well, but maybe I can optimize it to increse the "speed" of the query.
I think you may gain something by dropping the spreads subquery and moving the WHERE clause to the end, as in the following code. This loses your LIMIT restriction - perhaps you could put a LIMIT clause at the end as well, depending on what you're trying to achieve in terms of limiting the size of the output.
SELECT
s.last_spread, s.sd, s.mean, s.id
,c.id_ticker, c.coef
,t.ticker
,p.last, p.price
FROM spreads as s
INNER JOIN coef as c
ON c.id_spread = s.id
INNER JOIN tickers AS t
ON t.id = c.id_ticker
LEFT JOIN (SELECT prices.id_ticker, MAX(prices.date) as last, prices.price FROM prices GROUP BY prices.id_ticker) AS p
ON p.id_ticker = t.id
WHERE s.id_check = 1
I have inherited a database schema which has some design issues
Note that there are another 9 keys on the table which I haven't listed below, the keys in question look like
+-------+------------+----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| users | 0 | PRIMARY | 1 | userid | A | 604 | NULL | NULL | | BTREE | | |
| users | 1 | userid_2 | 1 | userid | A | 604 | NULL | NULL | | BTREE | | |
| users | 1 | userid_2 | 2 | age | A | 604 | NULL | NULL | YES | BTREE | | |
| users | 1 | userid_2 | 3 | image | A | 604 | 255 | NULL | YES | BTREE | | |
| users | 1 | userid_2 | 4 | gender | A | 604 | NULL | NULL | YES | BTREE | | |
| users | 1 | userid_2 | 5 | last_login | A | 604 | NULL | NULL | YES | BTREE | | |
| users | 1 | userid_2 | 6 | latitude | A | 604 | NULL | NULL | YES | BTREE | | |
| users | 1 | userid_2 | 7 | longitude | A | 604 | NULL | NULL | YES | BTREE | | |
+-------+------------+----------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
In a table with the following fields.
+--------------------------------+---------------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------------------+---------------------+------+-----+-------------------+----------------+
| userid | int(11) | NO | PRI | NULL | auto_increment |
| age | int(11) | YES | | NULL | |
| image | varchar(500) | YES | | | |
| gender | varchar(10) | YES | | NULL | |
| last_login | timestamp | YES | MUL | NULL | |
| latitude | varchar(20) | YES | MUL | NULL | |
| longitude | varchar(20) | YES | | NULL | |
+--------------------------------+---------------------+------+-----+-------------------+----------------+
Running an explain statement and forcing it to use userid_2, it uses 522 rows
describe SELECT userid, age FROM users USE INDEX(userid_2) WHERE `userid` >=100 and age >27 limit 10 ;
+----+-------------+-------+-------+---------------+----------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+----------+---------+------+------+--------------------------+
| 1 | SIMPLE | users | index | userid_2 | userid_2 | 941 | NULL | 522 | Using where; Using index |
+----+-------------+-------+-------+---------------+----------+---------+------+------+--------------------------+
1 row in set (0.02 sec)
if I don't force it to use the index it is just using the primary key, which only consists of the userid and only uses 261 rows
mysql> describe SELECT userid, age FROM users WHERE userid >=100 and age >27 limit 10 ;
+----+-------------+-------+-------+--------------------------------------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+--------------------------------------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | users | range | PRIMARY,users_user_ids_key,userid,userid_2 | PRIMARY | 4 | NULL | 261 | Using where |
+----+-------------+-------+-------+--------------------------------------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
Questions
Why is it examining more rows when it uses the compound composite index?
Why isn't the query using the userid_2 index if its not specified in the query?
That row count is only an estimate based on indexed value distribution.
You have two options:
Execute ANALYZE TABLE mytable to recalculate distributions and then re-try the describe
Don't worry about stuff that doesn't matter... rows is just an estimate anyway