Here is the first table 'tbl1':
+---------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------------------+------+-----+---------+----------------+
| val | varchar(45) | YES | MUL | NULL | |
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
+---------+---------------------+------+-----+---------+----------------+
With its indexes:
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| tbl1 | 0 | PRIMARY | 1 | id | A | 201826018 | NULL | NULL | | BTREE | |
| tbl1 | 1 | val | 1 | val | A | 2147085 | NULL | NULL | YES | BTREE | |
| tbl1 | 1 | id_val | 1 | id | A | 201826018 | NULL | NULL | | BTREE | |
| tbl1 | 1 | id_val | 2 | val | A | 201826018 | NULL | NULL | YES | BTREE | |
| tbl1 | 1 | val_id | 1 | val | A | 2147085 | NULL | NULL | YES | BTREE | |
| tbl1 | 1 | val_id | 2 | id | A | 201826018 | NULL | NULL | | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
(The reason for some extra indexing is this: http://bit.ly/KWx1Xz.)
The second table is just about the same. Here are its index cardinalities though:
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| tbl2 | 0 | PRIMARY | 1 | id | A | 201826018 | NULL | NULL | | BTREE | |
| tbl2 | 1 | val | 1 | val | A | 881336 | NULL | NULL | YES | BTREE | |
| tbl2 | 1 | id_val | 1 | id | A | 201826018 | NULL | NULL | | BTREE | |
| tbl2 | 1 | id_val | 2 | val | A | 201826018 | NULL | NULL | YES | BTREE | |
| tbl2 | 1 | val_id | 1 | val | A | 881336 | NULL | NULL | YES | BTREE | |
| tbl2 | 1 | val_id | 2 | id | A | 201826018 | NULL | NULL | | BTREE | |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
The task is to inner-join them on the val column and get the id's list (and to do it in 1 second).
Here is the 'join' approach:
SELECT tbl1.id FROM tbl1 JOIN tbl2 ON tbl1.val = 'iii' AND tbl2.val = 'iii' AND tbl1.id = tbl2.id;
Result: 10831 rows in set (55.15 sec)
Query explain:
+----+-------------+--------+--------+----------------------------------+---------+---------+---------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+--------+----------------------------------+---------+---------+---------------------------+------+--------------------------+
| 1 | SIMPLE | tbl1 | ref | PRIMARY,val,id_val,val_id | val_id | 138 | const | 5160 | Using where; Using index |
| 1 | SIMPLE | tbl2 | eq_ref | PRIMARY,val,id_val,val_id | PRIMARY | 8 | search_test.tbl1.id | 1 | Using where |
+----+-------------+--------+--------+----------------------------------+---------+---------+---------------------------+------+--------------------------+
And here is the 'in' approach:
SELECT id FROM tbl1 WHERE val = 'iii' and id IN (SELECT id FROM tbl2 WHERE val = 'iii');
Result: 10831 rows in set (1 min 10.15 sec)
Explain:
+----+--------------------+--------+-----------------+---------------------------------+---------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------+-----------------+---------------------------------+---------+---------+-------+------+--------------------------+
| 1 | PRIMARY | tbl1 | ref | val,val_id | val_id | 138 | const | 8553 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | tbl2 | unique_subquery | PRIMARY,val,id_val,val_id | PRIMARY | 8 | func | 1 | Using where |
+----+--------------------+--------+-----------------+---------------------------------+---------+---------+-------+------+--------------------------+
So, here is the question: how to tweak this query to let MySQL accomplish it in a second?
OK I have tested this on 30,000+ records per table and it runs pretty darn quick.
As it currently stands you're performing a join on two massive tables presently but if you scan for matches on 'val' on each table first that will reduce the size of your join sets substantially.
I originally posted this answer as a set of subqueries but I didn't realize that MySQL is painfully slow at nested subqueries since it executes from the outside in. However if you define the subqueries as views it runs them from the inside out.
So, first create the views.
CREATE VIEW tbl1_iii AS (
SELECT * FROM tbl1 WHERE val='iii'
);
CREATE VIEW tbl2_iii AS (
SELECT * FROM tbl2 WHERE val='iii'
);
Then run the query.
SELECT tbl1_iii.id from tbl1_iii,tbl2_iii
WHERE tbl1_iii.id = tbl2_iii.id;
Lightning.
SELECT tbl1.id FROM tbl1 JOIN tbl2 ON tbl1.id = tbl2.id and tbl1.val = tbl2.val
where tbl1.val = 'iii';
Related
I need to speed up this query. What can I do?
select i.resp_id as id from int_result i, response_set rs, cx_store_child cbu
where rs.survey_id IN(5550512,5550516,5550521,5550520,5590351,5590384,5679615,5679646,5691634,5699259,5699266,5699270)
and i.q_id IN(52603091,52251250,52250724,52251333,52919541,52920117,54409178,54409806,54625102,54738933,54739117,54739221)
and rs.t >= '2017-08-30 00:00:00' and rs.t <= '2017-09-30 00:00:00'
and i.response_set_id = rs.id and rs.cx_business_unit_id = cbu.child_bu_id
and cbu.business_unit_id = 30850
group by rs.cx_business_unit_id, i.a_id, extract(day from rs.t)
------------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| int_result | 0 | PRIMARY | 1 | id | A | 240843099 | NULL | NULL | | BTREE | | |
| int_result | 1 | q_id | 1 | q_id | A | 1442174 | NULL | NULL | | BTREE | | |
| int_result | 1 | a_id | 1 | a_id | A | 20070258 | NULL | NULL | | BTREE | | |
| int_result | 1 | resp_id | 1 | resp_id | A | 120421549 | NULL | NULL | | BTREE | | |
| int_result | 1 | response_set_id | 1 | response_set_id | A | 26760344 | NULL | NULL | | BTREE | | |
| int_result | 1 | survey_id | 1 | survey_id | A | 503855 | NULL | NULL | YES | BTREE | | |
| int_result | 1 | survey_id_2 | 1 | survey_id | A | 1459655 | NULL | NULL | YES | BTREE | | |
| int_result | 1 | survey_id_2 | 2 | q_id | A | 2736853 | NULL | NULL | | BTREE | | |
+------------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+------
--+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+----------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| response_set | 0 | PRIMARY | 1 | id | A | 14307454 | NULL | NULL | | BTREE | | |
| response_set | 1 | survey_id | 1 | survey_id | A | 223553 | NULL | NULL | | BTREE | | |
| response_set | 1 | id | 1 | id | A | 14307454 | NULL | NULL | | BTREE | | |
| response_set | 1 | external_id | 1 | external_id | A | 2921 | NULL | NULL | YES | BTREE | | |
| response_set | 1 | panel_member_id | 1 | panel_member_id | A | 357686 | NULL | NULL | YES | BTREE | | |
| response_set | 1 | email_group | 1 | email_group | A | 21259 | NULL | NULL | YES | BTREE | | |
| response_set | 1 | survey_timestamp_idx | 1 | survey_id | A | 433559 | NULL | NULL | | BTREE | | |
| response_set | 1 | survey_timestamp_idx | 2 | t | A | 14307454 | NULL | NULL | YES | BTREE | | |
| response_set | 1 | bu_id | 1 | cx_business_unit_id | A | 2246 | NULL | NULL | YES | BTREE | | |
----------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| cx_store_child | 0 | PRIMARY | 1 | id | A | 13667 | NULL | NULL | | BTREE | | |
| cx_store_child | 0 | bu_child_ref | 1 | business_unit_id | A | 13667 | NULL | NULL | YES | BTREE | | |
| cx_store_child | 0 | bu_child_ref | 2 | child_bu_id | A | 13667 | NULL | NULL | YES | BTREE | | |
| cx_store_child | 1 | cx_feedback_id | 1 | cx_feedback_id | A | 506 | NULL | NULL | YES | BTREE | | |
| cx_store_child | 1 | business_unit_id | 1 | business_unit_id | A | 13667 | NULL | NULL | YES | BTREE | | |
| cx_store_child | 1 | child_bu_id | 1 | child_bu_id | A | 13667 | NULL | NULL | YES | BTREE | | |
+----------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+
You seem to have an index on response_set(survey_id, t).
Try creating a so-called compound covering index on
response_set(t, survey_id, cx_business_unit_id)
This may help optimize the part of your query using that table. Why? Your query calls for a range scan on t, and a column used in a range scan must be first in its compound index.
Similarly, an index on int_result (q_id, resp_id, response_set_id) may help extract the data you need from that table.
Some notes:
It's hard to tell what your query does. Maybe some explanation will help you get better results here?
and rs.t >= '2017-08-30 00:00:00' and rs.t <= '2017-09-30 00:00:00' is probably incorrect as to the end of the time range. It probably contains an off-by-one error. Do you want < in place of <= ? What you have given includes the records with timestamps precisely at midnight on 30-Sep-2017, but no records on that day after midnight.
You have one index on int_result(survey_id, q_id) and another on int_result(survey_id). The latter index is entirely redundant with the former and you can drop it.
You seem to have lots of single-column indexes. Pro tip: don't add such indexes unless you know you need them. They rarely help speed up arbitrary queries and always slow down insertions and updates. Why might you need them? If you have a query you know needs them, or you need to enforce uniqueness. Drop the indexes you don't need.
Use 21st-century JOIN syntax instead of the old-timey comma-join syntax as follows. It's easier to read.
from int_result i
join response_set rs on i.response_set_id = rs.id
join cx_store_child cbu on rs.cx_business_unit_id = cbu.child_bu_id
Read this. You're maintaining a large database and it's worth your time to learn a lot about indexing. http://use-the-index-luke.com/
There are many ways that the Optimizer may attempt to perform the query. The following indexes give it some flexibility to find the optimal order of hitting the tables:
cbu: INDEX(business_unit_id, child_bu_id)
rs: INDEX(t, cx_business_unit_id, survey_id)
rs: INDEX(survey_id, t, cx_business_unit_id)
rs: INDEX(cx_business_unit_id, survey_id, t)
i: INDEX(response_set_id, q_id)
i: INDEX(q_id, response_set_id)
I arranged for rs and cbu to have "covering" indexes in all cases; this helps some.
(Yes, you should change to JOIN...ON as O. Jones suggests. And the rest of his suggestions.)
Before further discussion, please provide SHOW CREATE TABLE; the could be datatype issues, too.
A PRIMARY KEY is a UNIQUE key is an INDEX -- so INDEX(id) in rs is redundant.
I am working on large tables of my database (mysql). Some of my queries are running more than 5 minutes. Here is an example of my query that is running slow:
select b.DESCRIPTION collateral_type, a.description brand, a.year,
a.model, a.plate_number, d.description fuel, a.chassis_number,
a.engine_number, c.description as color, insurance_name
from lms_loan_application_collateral a inner join
lms_collateral_type b
on a.COLLATERAL_TYPE_ID = b.id left join
lms_color c
on a.color_id = c.id left join
lms_fuel_type d
on a.fuel_type_id = d.id inner join
lms_loan_application e
on e.id = a.loan_application_id inner join
lms_dlr_dtl f
on e.code = f.lano inner join
lms_loanapp_dtl g
on f.lano = g.lano
where b.description in ('Motorcycle', 'Automotive', 'Heavy Equipment');
here is a mysql explain of the query
+----+-------------+-------+-------+--------------------------------------------------------+-------------------------+---------+------------+-------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+--------------------------------------------------------+-------------------------+---------+------------+-------+----------------------------------------------------+
| 1 | SIMPLE | g | index | lms_loanapp_dtl_lano | lms_loanapp_dtl_lano | 23 | NULL | 23432 | Using where; Using index |
| 1 | SIMPLE | f | ref | lano | lano | 23 | new.g.LANO | 1 | Using index |
| 1 | SIMPLE | b | ALL | lms_collateral_type_description,lms_collateral_type_id | NULL | NULL | NULL | 11 | Using where; Using join buffer (Block Nested Loop) |
| 1 | SIMPLE | a | ref | collateral_type_id | collateral_type_id | 5 | new.b.ID | 4067 | Using index condition |
| 1 | SIMPLE | e | ref | lms_loan_application_id | lms_loan_application_id | 153 | func | 1 | Using index condition; Using where |
| 1 | SIMPLE | c | ALL | LMS_color_id | NULL | NULL | NULL | 20 | Range checked for each record (index map: 0x1) |
| 1 | SIMPLE | d | ALL | LMS_fuel_type_id | NULL | NULL | NULL | 4 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+-------+--------------------------------------------------------+-------------------------+---------+------------+-------+----------------------------------------------------+
under the key column, there are NULL values. I am not sure if mysql uses the index keys I have created.
Below is the list of indexes I created for the involved tables:
SHOW INDEX FROM lms_loan_application_collateral FROM new;
+---------------------------------+------------+---------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------------------------+------------+---------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| lms_loan_application_collateral | 1 | color_id | 1 | color_id | A | 36 | NULL | NULL | YES | BTREE | | |
| lms_loan_application_collateral | 1 | fuel_type_id | 1 | fuel_type_id | A | 6 | NULL | NULL | YES | BTREE | | |
| lms_loan_application_collateral | 1 | loan_application_id | 1 | loan_application_id | A | 89493 | NULL | NULL | YES | BTREE | | |
| lms_loan_application_collateral | 1 | collateral_type_id | 1 | collateral_type_id | A | 22 | NULL | NULL | YES | BTREE | | |
+---------------------------------+------------+---------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
SHOW INDEX FROM lms_collateral_type FROM new;
+---------------------+------------+---------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------------+------------+---------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| lms_collateral_type | 1 | lms_collateral_type_description | 1 | DESCRIPTION | A | 11 | NULL | NULL | YES | BTREE | | |
| lms_collateral_type | 1 | lms_collateral_type_id | 1 | ID | A | 11 | NULL | NULL | YES | BTREE | | |
+---------------------+------------+---------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
SHOW INDEX FROM lms_color FROM new;
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| lms_color | 1 | LMS_color_id | 1 | id | A | 20 | NULL | NULL | | BTREE | | |
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
SHOW INDEX FROM lms_fuel_type FROM new;
+---------------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| lms_fuel_type | 1 | LMS_fuel_type_id | 1 | id | A | 4 | NULL | NULL | YES | BTREE | | |
+---------------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
SHOW INDEX FROM lms_loan_application FROM new;
+----------------------+------------+-------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------------------+------------+-------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| lms_loan_application | 1 | lmspis_id | 1 | PIS_ID | A | 1878 | NULL | NULL | YES | BTREE | | |
| lms_loan_application | 1 | loan_type_id | 1 | LOAN_TYPE_ID | A | 6 | NULL | NULL | YES | BTREE | | |
| lms_loan_application | 1 | lms_loan_application_id | 1 | ID | A | 1878 | NULL | NULL | YES | BTREE | | |
+----------------------+------------+-------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
SHOW INDEX FROM lms_dlr_dtl FROM new;
+-------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| lms_dlr_dtl | 0 | PRIMARY | 1 | LDDID | A | 90066 | NULL | NULL | | BTREE | | |
| lms_dlr_dtl | 1 | lano | 1 | LANO | A | 90066 | NULL | NULL | YES | BTREE | | |
+-------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
SHOW INDEX FROM lms_loanapp_dtl FROM new;
+-----------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| lms_loanapp_dtl | 0 | PRIMARY | 1 | LLADID | A | 23432 | NULL | NULL | | BTREE | | |
| lms_loanapp_dtl | 1 | lms_loanapp_dtl_lano | 1 | LANO | A | 23432 | NULL | NULL | YES | BTREE | | |
+-----------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
The tables which indicates the type as ALL does not utilise the indexes. Create indexes on id column on tables represents in c and d and one index on both description and id columns on b.
create index ix_id on lms_color(id);
create index ix_id on lms_fuel_type(id);
create index ix_description_id on lms_collateral_type(description,id);
MySQL is using ALL for your WHERE because of low cardinality. So MySQL thinks it is cheaper to just doing table scan over 11 entries than using binary search with index.
The real problem is that you do not filter using column whose index has higher cardinality (lms_loanapp_dtl_lano and collateral_type_id).
I am not sure why this query isn't using an index on the table world_cities. The indexes are the same and in the same order. The data types are of the same type and same length. The only difference is the key_name.
So why does it not use the key?
if I run the query I get: 318824 rows in set (2 min 51.30 sec)
Here is what I have done:
explain select * from zip_codes zc
inner join world_cities wc on
zc.city = wc.city
and zc.country = wc.country
and zc.region = wc.region;
+----+-------------+-------+------+---------------------+---------------------+---------+---------------------------------------------------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------------+---------------------+---------+---------------------------------------------------------------+---------+-------------+
| 1 | SIMPLE | wc | ALL | city_country_region | NULL | NULL | NULL | 3173958 | |
| 1 | SIMPLE | zc | ref | country_region_city | country_region_city | 165 | largedbapi.wc.city,largedbapi.wc.country,largedbapi.wc.region | 1 | Using where |
+----+-------------+-------+------+---------------------+---------------------+---------+---------------------------------------------------------------+---------+-------------+
mysql> show indexes from world_cities where Key_name like '%city%';
+--------------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| world_cities | 1 | city_country_region | 1 | city | A | NULL | NULL | NULL | YES | BTREE | |
| world_cities | 1 | city_country_region | 2 | country | A | NULL | NULL | NULL | YES | BTREE | |
| world_cities | 1 | city_country_region | 3 | region | A | NULL | NULL | NULL | YES | BTREE | |
+--------------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
mysql> show indexes from zip_codes where Key_name like '%city%';
+-----------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| zip_codes | 1 | country_region_city | 1 | city | A | 218424 | NULL | NULL | YES | BTREE | |
| zip_codes | 1 | country_region_city | 2 | country | A | 218424 | NULL | NULL | YES | BTREE | |
| zip_codes | 1 | country_region_city | 3 | region | A | 436849 | NULL | NULL | YES | BTREE | |
+-----------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
Your city_country_region key on world_cities has a NULL cardinality, while a non-NULL one is required for it to be use. Running analyze on it should fix it.
ANALYZE TABLE world_cities
Also have a look at this post for more info on NULL cardinality.
I am going to join two tables by using a single position in one table to the range (represented by two columns) in another table.
However, the performance is too slow, which is about 20 mins.
I have tried adding the index on the table or changing the query.
But the performance is still poor.
So, I am asking for optimization of the joining speed.
The following is the query to MySQL.
mysql> SELECT `inVar`.chrom, `inVar`.pos, `openChrom_K562`.score
-> FROM `inVar`
-> LEFT JOIN `openChrom_K562`
-> ON (
-> `inVar`.chrom=`openChrom_K562`.chrom AND
-> `inVar`.pos BETWEEN `openChrom_K562`.chromStart AND `openChrom_K562`.chromEnd
-> );
inVar and openChrom_K562 are the tables I used.
inVar stores the single position in each row.
openChrom_K562 stores the range information indicated by chromStart and chromEnd.
inVar contains 57902 rows and openChrom_K562 has 137373 rows respectively.
Fields on the tables.
mysql> DESCRIBE inVar;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| chrom | varchar(31) | NO | PRI | NULL | |
| pos | int(10) | NO | PRI | NULL | |
+-------+-------------+------+-----+---------+-------+
mysql> DESCRIBE openChrom_K562;
+------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+-------+
| chrom | varchar(31) | NO | MUL | NULL | |
| chromStart | int(10) | NO | MUL | NULL | |
| chromEnd | int(10) | NO | | NULL | |
| score | int(10) | NO | | NULL | |
+------------+-------------+------+-----+---------+-------+
Index built in the tables
mysql> SHOW INDEX FROM inVar;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| inVar | 0 | PRIMARY | 1 | chrom | A | NULL | NULL | NULL | | BTREE | |
| inVar | 0 | PRIMARY | 2 | pos | A | 57902 | NULL | NULL | | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
mysql> SHOW INDEX FROM openChrom_K562;
+----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| openChrom_K562 | 1 | start_end | 1 | chromStart | A | 137373 | NULL | NULL | | BTREE | |
| openChrom_K562 | 1 | start_end | 2 | chromEnd | A | 137373 | NULL | NULL | | BTREE | |
| openChrom_K562 | 1 | chrom_only | 1 | chrom | A | 22 | NULL | NULL | | BTREE | |
| openChrom_K562 | 1 | chrom_start | 1 | chrom | A | 22 | NULL | NULL | | BTREE | |
| openChrom_K562 | 1 | chrom_start | 2 | chromStart | A | 137373 | NULL | NULL | | BTREE | |
| openChrom_K562 | 1 | chrom_end | 1 | chrom | A | 22 | NULL | NULL | | BTREE | |
| openChrom_K562 | 1 | chrom_end | 2 | chromEnd | A | 137373 | NULL | NULL | | BTREE | |
+----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
Execution plan on MySQL
mysql> EXPLAIN SELECT `inVar`.chrom, `inVar`.pos, score FROM `inVar` LEFT JOIN `openChrom_K562` ON ( inVar.chrom=openChrom_K562.chrom AND `inVar`.pos BETWEEN chromStart AND chromEnd );
+----+-------------+----------------+-------+--------------------------------------------+------------+---------+-----------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+-------+--------------------------------------------+------------+---------+-----------------+-------+-------------+
| 1 | SIMPLE | inVar | index | NULL | PRIMARY | 37 | NULL | 57902 | Using index |
| 1 | SIMPLE | openChrom_K562 | ref | start_end,chrom_only,chrom_start,chrom_end | chrom_only | 33 | tmp.inVar.chrom | 5973 | |
+----+-------------+----------------+-------+--------------------------------------------+------------+---------+-----------------+-------+-------------+
It seems it only optimizes by looking chrom in two tables. Then do the brute-force comparing in the tables.
Is there any ways to do the further optimization like indexing on the position?
(It is my first time posting the question, sorry for the poor posting quality.)
chrom_only is likely to be a bad index selection for your join as you only have chrom 22 values.
If I have interpreted this right the query should be faster if using start_end
SELECT `inVar`.chrom, `inVar`.pos, `openChrom_K562`.score
FROM `inVar`
LEFT JOIN `openChrom_K562`
USE INDEX (`start_end`)
ON (
`inVar`.chrom=`openChrom_K562`.chrom AND
`inVar`.pos BETWEEN `openChrom_K562`.chromStart AND `openChrom_K562`.chromEnd
)
I need to find the best index for this query:
SELECT c.id id, type
FROM Content c USE INDEX (type_proc_last_cat)
LEFT JOIN Battles b ON c.id = b.id
WHERE type = 1
AND processing_status = 1
AND category IN (13, 19)
AND status = 4
ORDER BY last_change DESC
LIMIT 100";
The tables look like this:
mysql> describe Content;
+-------------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+---------------------+------+-----+---------+-------+
| id | bigint(20) unsigned | NO | PRI | NULL | |
| type | tinyint(3) unsigned | NO | MUL | NULL | |
| category | bigint(20) unsigned | NO | | NULL | |
| processing_status | tinyint(3) unsigned | NO | | NULL | |
| last_change | int(10) unsigned | NO | | NULL | |
+-------------------+---------------------+------+-----+---------+-------+
mysql> show indexes from Content;
+---------+------------+---------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+---------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
| Content | 0 | PRIMARY | 1 | id | A | 4115 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_last_cat | 1 | type | A | 4 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_last_cat | 2 | processing_status | A | 20 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_last_cat | 3 | last_change | A | 4115 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_last_cat | 4 | category | A | 4115 | NULL | NULL | | BTREE | |
+---------+------------+---------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
mysql> describe Battles;
+---------------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+---------------------+------+-----+---------+-------+
| id | bigint(20) unsigned | NO | PRI | NULL | |
| status | tinyint(4) unsigned | NO | | NULL | |
| status_last_changed | int(11) unsigned | NO | | NULL | |
+---------------------+---------------------+------+-----+---------+-------+
mysql> show indexes from Battles;
+---------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Battles | 0 | PRIMARY | 1 | id | A | 1215 | NULL | NULL | | BTREE | |
| Battles | 0 | id_status | 1 | id | A | 1215 | NULL | NULL | | BTREE | |
| Battles | 0 | id_status | 2 | status | A | 1215 | NULL | NULL | | BTREE | |
+---------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
And I get output like this:
mysql> explain
-> SELECT c.id id, type
-> FROM Content c USE INDEX (type_proc_last_cat)
-> LEFT JOIN Battles b USE INDEX (id_status) ON c.id = b.id
-> WHERE type = 1
-> AND processing_status = 1
-> AND category IN (13, 19)
-> AND status = 4
-> ORDER BY last_change DESC
-> LIMIT 100;
+----+-------------+-------+--------+--------------------+--------------------+---------+-----------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------+--------------------+---------+-----------------------+------+--------------------------+
| 1 | SIMPLE | c | ref | type_proc_last_cat | type_proc_last_cat | 2 | const,const | 1352 | Using where; Using index |
| 1 | SIMPLE | b | eq_ref | id_status | id_status | 9 | wtm_master.c.id,const | 1 | Using where; Using index |
+----+-------------+-------+--------+--------------------+--------------------+---------+-----------------------+------+--------------------------+
The trouble is the rows count for the Content table. It appears MySQL is unable to effectively use both last_change and category in the type_proc_last_cat index. If I switch the order of last_change and category, fewer rows are selected but it results in a filesort for the ORDER BY, meaning it pulls all the matching rows from the database. That is worse, since there are 100,000+ rows in both tables.
Tables are both InnoDB, so keep in mind that the PRIMARY key is appended to every other index. So the index the index type_proc_last_cat above behaves likes it's on (type, processing_status, last_change, category, id). I am aware I could change the PRIMARY key for Battles to (id, status) and drop the id_status index (and I may just do that).
Edit: Any value for type, category, processing_status, and status is less than 20% of the total values. last_change and status_last_change are unix timestamps.
Edit: If I use a different index with category and last_change in reverse order, I get this:
mysql> show indexes from Content;
+---------+------------+---------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+---------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
| Content | 0 | PRIMARY | 1 | id | A | 4115 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_cat_last | 1 | type | A | 6 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_cat_last | 2 | processing_status | A | 26 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_cat_last | 3 | category | A | 228 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_cat_last | 4 | last_change | A | 4115 | NULL | NULL | | BTREE | |
+---------+------------+---------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
mysql> explain SELECT c.id id, type FROM Content c USE INDEX (type_proc_cat_last) LEFT JOIN Battles b
USE INDEX (id_status) ON c.id = b.id WHERE type = 1 AND processing_status = 1 AND category IN (13, 19) AND status = 4 ORDER BY last_change DESC LIMIT 100;
+----+-------------+-------+-------+--------------------+--------------------+---------+-----------------------+------+------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+--------------------+--------------------+---------+-----------------------+------+------------------------------------------+
| 1 | SIMPLE | c | range | type_proc_cat_last | type_proc_cat_last | 10 | NULL | 165 | Using where; Using index; Using filesort |
| 1 | SIMPLE | b | ref | id_status | id_status | 9 | wtm_master.c.id,const | 1 | Using where; Using index |
+----+-------------+-------+-------+--------------------+--------------------+---------+-----------------------+------+------------------------------------------+
The filesort worries me as it tells me MySQL pulls all the matching rows first, before sorting. This will be a big problem when there are 100,000+.
rows field in EXPLAIN doesn't reflect the number of rows that actually were read. It reflects the number of rows that possible will be affected. Also it doesn't rely on LIMIT, because LIMIT is applied after the plan has been calculated.
So you don't need to worry about it.
Also I would suggest you to swap last_change and category in type_proc_last_cat so mysql can try to use the last index part (last_change) for sorting purposes.