I have a MySQL query on which I believe I already have all the indexes and so I would expect the query to be processed instantly. However, the query takes several seconds to finish. Using EXPLAIN shows the query uses INDEX CONDITION:
+----+-------------+-------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------+-----------+---------+-------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------+-----------+---------+-------------------+------+----------------------------------------------+
| 1 | SIMPLE | il | fulltext | PRIMARY,FK32F0579FBAF38C6F,name_ftx | name_ftx | 0 | NULL | 1 | Using where; Using temporary |
| 1 | SIMPLE | i | eq_ref | PRIMARY,FK5C6729AF5D52323,... | PRIMARY | 8 | goout.il.thing_id | 1 | Using where |
| 1 | SIMPLE | s | ref | state,event_id,parent_id | parent_id | 9 | const | 5 | Using index condition; Using where; Distinct |
+----+-------------+-------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------+-----------+---------+-------------------+------+----------------------------------------------+
Please notice the third row which uses this INDEX CONDITION. Is there a way to find what causes it?
SELECT DISTINCT i.id
FROM event i
JOIN event_locale il ON i.id = il.thing_id
LEFT JOIN event_schedule s ON i.id = s.event_id
WHERE (i.parent_id IS NULL)
AND (s.state = 2 OR s.state = 3)
AND s.parent_id IS NULL
AND (i.state = 2 OR i.state = 3)
AND (MATCH(il.name) AGAINST ("jan"))
LIMIT 10;
I am positive I have indexes on all used columns and fulltext index on the match-against column.
Related
Query
SELECT SQL_NO_CACHE contacts.id,
contacts.date_modified contacts__date_modified
FROM contacts
INNER JOIN
(SELECT tst.team_set_id
FROM team_sets_teams tst
INNER JOIN team_memberships team_membershipscontacts ON (team_membershipscontacts.team_id = tst.team_id)
AND (team_membershipscontacts.user_id = '5daa2e92-c347-11e9-afc5-525400a80916')
AND (team_membershipscontacts.deleted = 0)
GROUP BY tst.team_set_id) contacts_tf ON contacts_tf.team_set_id = contacts.team_set_id
LEFT JOIN contacts_cstm contacts_cstm ON contacts_cstm.id_c = contacts.id
WHERE contacts.deleted = 0
ORDER BY contacts.date_modified DESC,
contacts.id DESC
LIMIT 21;
Takes extremely long (2 minutes on 2M records). I cant change this query, since it is system generated
This is it's explain:
+----+-------------+--------------------------+------------+--------+-------------------------------------------------------------------------------------------------------+----------------------------+---------+-------------------------------------------+---------+----------+---------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------------------+------------+--------+-------------------------------------------------------------------------------------------------------+----------------------------+---------+-------------------------------------------+---------+----------+---------------------------------------------------------------------+
| 1 | PRIMARY | contacts | NULL | ref | idx_contacts_tmst_id,idx_del_date_modified,idx_contacts_del_last,idx_cont_del_reports,idx_del_id_user | idx_del_date_modified | 2 | const | 1113718 | 100.00 | Using temporary; Using filesort |
| 1 | PRIMARY | <derived3> | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 50.00 | Using where; Using join buffer (Block Nested Loop) |
| 1 | PRIMARY | contacts_cstm | NULL | eq_ref | PRIMARY | PRIMARY | 144 | sugarcrm.contacts.id | 1 | 100.00 | Using index |
| 3 | DERIVED | team_membershipscontacts | NULL | ref | idx_team_membership,idx_teammemb_team_user,idx_del_team_user | idx_team_membership | 145 | const | 2 | 99.36 | Using index condition; Using where; Using temporary; Using filesort |
| 3 | DERIVED | tst | NULL | ref | idx_ud_set_id,idx_ud_team_id,idx_ud_team_set_id,idx_ud_team_id_team_set_id | idx_ud_team_id_team_set_id | 144 | sugarcrm.team_membershipscontacts.team_id | 1 | 100.00 | Using index |
+----+-------------+--------------------------+------------+--------+-------------------------------------------------------------------------------------------------------+----------------------------+---------+-------------------------------------------+---------+----------+---------------------------------------------------------------------+
But when I use force index(idx_del_date_modified) (which is the same index used in explain), the query takes just 0.01s and I get slightly different explain.
+----+-------------+--------------------------+------------+--------+----------------------------------------------------------------------------+----------------------------+---------+-------------------------------------------+---------+----------+---------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------------------+------------+--------+----------------------------------------------------------------------------+----------------------------+---------+-------------------------------------------+---------+----------+---------------------------------------------------------------------+
| 1 | PRIMARY | contacts | NULL | ref | idx_del_date_modified | idx_del_date_modified | 2 | const | 1113718 | 100.00 | Using where |
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 50.00 | Using where |
| 1 | PRIMARY | contacts_cstm | NULL | eq_ref | PRIMARY | PRIMARY | 144 | sugarcrm.contacts.id | 1 | 100.00 | Using index |
| 2 | DERIVED | team_membershipscontacts | NULL | ref | idx_team_membership,idx_teammemb_team_user,idx_del_team_user | idx_team_membership | 145 | const | 2 | 99.36 | Using index condition; Using where; Using temporary; Using filesort |
| 2 | DERIVED | tst | NULL | ref | idx_ud_set_id,idx_ud_team_id,idx_ud_team_set_id,idx_ud_team_id_team_set_id | idx_ud_team_id_team_set_id | 144 | sugarcrm.team_membershipscontacts.team_id | 1 | 100.00 | Using index |
+----+-------------+--------------------------+------------+--------+----------------------------------------------------------------------------+----------------------------+---------+-------------------------------------------+---------+----------+---------------------------------------------------------------------+
The first query uses temporary table and file sort, but the query with force index uses just where. Shouldn't the queries be the same? Why is the query with force index so much faster - used index is still the same.
According to MySQL manual:
Temporary tables can be created under conditions such as these:
If there is an ORDER BY clause and a different GROUP BY clause, or if
the ORDER BY or GROUP BY contains columns from tables other than the
first table in the join queue, a temporary table is created.
DISTINCT combined with ORDER BY may require a temporary table.
If you use the SQL_SMALL_RESULT option, MySQL uses an in-memory
temporary table, unless the query also contains elements (described
later) that require on-disk storage.
Likely, you have better performance because in MySQL there is the query optimizer component.
If you create index the query optimizer could not use the index column even though the index exists.
Using force index(..) you are forcing MySql to use index, instead.
Please consider a detailed example here.
I have two tables called ny_clean (3454602 entries) and pickup_0_ids_temp_table (2739268 entries) who have both an id CHAR(11) column which is a primary key and has a BTREE index on top of it ( MySQL 5.7) .
The "id" column in pickup_0_ids_temp_table is a subset of ny_clean and I want to get a result which is ny_clean without the id values from pickup_0_ids_temp_table.
Option 1:
EXPLAIN
SELECT *
FROM pickup_0_ids_temp_table as t
JOIN ny_clean as n
ON n.id != t.id;
+----+-------------+----------+------------+-------+---------------+-------------------+---------+------+---------+----------+-----------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+---------------+-------------------+---------+------+---------+----------+-----------------------------------------------------------------+
| 1 | SIMPLE | t | NULL | index | NULL | PRIMARY | 11 | NULL | 2734512 | 100.00 | Using index |
| 1 | SIMPLE | ny_clean | NULL | index | NULL | btree_pk_ny_clean | 11 | NULL | 3445904 | 90.00 | Using where; Using index; Using join buffer (Block Nested Loop) |
+----+-------------+----------+------------+-------+---------------+-------------------+---------+------+---------+----------+-----------------------------------------------------------------+
Option 2:
EXPLAIN
SELECT *
FROM ny_clean as n
WHERE n.id NOT IN (
SELECT id
FROM pickup_0_ids_temp_table);
+----+--------------------+-------------------------+------------+-----------------+------------------------+---------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------------------------+------------+-----------------+------------------------+---------+---------+------+---------+----------+-------------+
| 1 | PRIMARY | n | NULL | ALL | NULL | NULL | NULL | NULL | 3445904 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | pickup_0_ids_temp_table | NULL | unique_subquery | PRIMARY,btree_pickup_0 | PRIMARY | 11 | func | 1 | 100.00 | Using index |
+----+--------------------+-------------------------+------------+-----------------+------------------------+---------+---------+------+---------+----------+-------------+
I then use one of the options inside this larger query
EXPLAIN
INSERT INTO y
SELECT id, pickup_longitude, pickup_latitude
FROM x
JOIN
(OPTION 1 OR 2) as z
ON z.id = x.id;
When I used Option 1 inside the larger query it ran for two days and it was not finished. Option 2 on the other hand did the job in less than 30minutes
My Question: Why is that?
Following the MySQL documentation (https://dev.mysql.com/doc/refman/5.7/en/subquery-materialization.html) I would suspect that it is due to materialization of the subquery but how would I check this ?
And am I interpreting the EXPLAIN Output wrong? Because judging from it I would expect Option 1 to be faster since it uses an index on both tables
Or does it have to do ith the larger query?
Thanks in advance
Your option 1 doesn't do what you think will do.
If you have two tables
n.id t.id
1 1
2 2
3 3
ON n.id != t.id;
You get:
1,2
1,3
2,1
2,3
3,1
3,2
That is almost a cartesian product. So 3.4 mill x 2.7 mill ~ 9.18 mill rows
Then you try to do a JOIN and because that materialzed table doesnt have index will take very long time.
MySQL version 5.7.17
When I use JOIN for small tables, MySQL don't allow me to use index on the main table.
Compare:
EXPLAIN SELECT `ko_base_client_orders`.*
FROM `ko_base_client_orders`
LEFT JOIN `ko_base_new_perspe` AS `ko_of`
ON (`ko_of`.`id` = `ko_base_client_orders`.`office`)
ORDER BY `order_time` DESC
LIMIT 10 OFFSET 0;
+----+-------------+-----------------------+------------+-------+---------------+---------+---------+------+-------+----------+-----------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------+------------+-------+---------------+---------+---------+------+-------+----------+-----------------------------------------------------------------+
| 1 | SIMPLE | ko_base_client_orders | NULL | ALL | NULL | NULL | NULL | NULL | 60737 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | ko_of | NULL | index | PRIMARY | PRIMARY | 4 | NULL | 2 | 100.00 | Using where; Using index; Using join buffer (Block Nested Loop) |
+----+-------------+-----------------------+------------+-------+---------------+---------+---------+------+-------+----------+-----------------------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
and the same query with index hint:
EXPLAIN SELECT `ko_base_client_orders`.*
FROM `ko_base_client_orders`
LEFT JOIN `ko_base_new_perspe` AS `ko_of`
FORCE INDEX FOR JOIN (`PRIMARY`)
ON (`ko_of`.`id` = `ko_base_client_orders`.`office`)
ORDER BY `order_time` DESC
LIMIT 10 OFFSET 0;
+----+-------------+-----------------------+------------+--------+---------------+------------+---------+---------------------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------+------------+--------+---------------+------------+---------+---------------------------------+------+----------+-------------+
| 1 | SIMPLE | ko_base_client_orders | NULL | index | NULL | order_time | 6 | NULL | 10 | 100.00 | NULL |
| 1 | SIMPLE | ko_of | NULL | eq_ref | PRIMARY | PRIMARY | 4 | e3.ko_base_client_orders.office | 1 | 100.00 | Using index |
+----+-------------+-----------------------+------------+--------+---------------+------------+---------+---------------------------------+------+----------+-------------+
In second case MySQL uses key for order_time, but in first example it ignores index and scans the entire table.
I can not predict which table would be small, and which would not (it differs from project to project), so I don't want to use index hints all the time. Is there another solution for this problem?
I have a query which is very slow in INNER JOIN condition, but is faster when used in WHERE IN clause:
Slower Inner join:
SELECT *
FROM cases
left join
(
select tst.team_set_id
from team_sets_teams tst
INNER JOIN team_memberships team_memberships
ON tst.team_id = team_memberships.team_id
AND team_memberships.user_id = '1'
AND team_memberships.deleted=0 group by tst.team_set_id
) cases_tf
ON cases_tf.team_set_id = cases.team_set_id
LEFT JOIN contacts_cases
ON contacts_cases.case_id = cases.id
AND contacts_cases.deleted = 0
where cases.deleted=0
ORDER BY cases.name LIMIT 0,20;
Faster where in:
SELECT *
FROM cases
LEFT JOIN contacts_cases
ON contacts_cases.case_id = cases.id
AND contacts_cases.deleted = 0
where cases.deleted=0
and cases.team_set_id in (
select tst.team_set_id
from team_sets_teams tst
INNER JOIN team_memberships team_memberships
ON tst.team_id = team_memberships.team_id
AND team_memberships.user_id = '1'
AND team_memberships.deleted=0
group by tst.team_set_id
)
ORDER BY cases.name LIMIT 0,20;
The explain plan for INNER JOIN and WHERE IN clause are below:
Inner Join:
+----+-------------+------------------+------+--------------------------------------------+---------------------+---------+-----------------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+------+--------------------------------------------+---------------------+---------+-----------------------------------+--------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 4 | Using temporary; Using filesort |
| 1 | PRIMARY | cases | ref | idx_cases_tmst_id | idx_cases_tmst_id | 109 | cases_tf.team_set_id | 446976 | Using where |
| 1 | PRIMARY | contacts_cases | ref | idx_con_case_case | idx_con_case_case | 111 | sugarcrm.cases.id | 1 | |
| 2 | DERIVED | team_memberships | ref | idx_team_membership,idx_teammemb_team_user | idx_team_membership | 109 | | 2 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | tst | ref | idx_ud_team_id | idx_ud_team_id | 109 | sugarcrm.team_memberships.team_id | 1 | Using where |
+----+-------------+------------------+------+--------------------------------------------+---------------------+---------+-----------------------------------+--------+----------------------------------------------+
While in condition:
------+-----------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------------+-------+--------------------------------------------+---------------------+---------+-----------------------------------+------+----------------------------------------------+
| 1 | PRIMARY | cases | index | NULL | idx_case_name | 768 | NULL | 20 | Using where |
| 1 | PRIMARY | contacts_cases | ref | idx_con_case_case | idx_con_case_case | 111 | sugarcrm.cases.id | 1 | |
| 2 | DEPENDENT SUBQUERY | team_memberships | ref | idx_team_membership,idx_teammemb_team_user | idx_team_membership | 109 | const | 2 | Using where; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | tst | ref | idx_ud_team_id | idx_ud_team_id | 109 | sugarcrm.team_memberships.team_id | 1 | Using where |
+----+--------------------+------------------+-------+--------------------------------------------+---------------------+---------+-----------------------------------+------+----------------------------------------------+
Though there are indexes, I couldn't figure out what the problem. Please help me out. Thanks.
(This is a query in sugarcrm)
It is impossible to convert IN condition to INNER JOIN.
A query with IN condition is called semi-join.
A semi-join returns rows from one table that would join with another table, but without performing a complete join.
Here is a simple example of a semi-join query using IN operator:
SELECT *
FROM table1
WHERE some-column IN (
SELECT some-other-column
FROM table2
WHERE some-conditions
)
the above semi-join can be converted to semantically equivalent query
(eqivalent - means giving exactly same results)
using EXISTS operator and dependend subquery:
SELECT *
FROM table1
WHERE EXISTS(
SELECT 1
FROM table2
WHERE some-conditions
AND table1.some-column = table2.some-other-column
)
Most leading databases use the same plan for both of the above queries, and their speed is the same,
unfortunately this is not always true for MySql.
Joins and semi joins are totally different queries, with completely different execution plans, so comparing their speed is like comparing apples to onions.
You can try to convert the first query with IN into a query with EXIST, but not into the join.
I have a query :
SELECT listings.*, listingagents.agentid
FROM listings
LEFT JOIN listingagents ON (listingagents.id = listings.listingagentid)
LEFT JOIN ignore ON (ignore.system_key = listings.listingid)
WHERE ignore.id IS NULL
ORDER BY listings.id ASC
I am trying to improve the performance of this query since it is very slow and it is putting a heavy load on the MySQL server.
When I do a mysql explain, output shows :
+--------+-------------+---------------+--------+---------------+------------+---------+----------------------------+--------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+--------+-------------+---------------+--------+---------------+------------+---------+----------------------------+--------+-------------------------+
| 1 | SIMPLE | listings | ALL | NULL | NULL | NULL | NULL | 383360 | Using filesort |
| 1 | SIMPLE | listingagents | eq_ref | PRIMARY | PRIMARY | 4 | db.listings.listingagen... | 1 | |
| 1 | SIMPLE | ignore | ref | system_key | system_key | 1 | const | 404 | Using where; Not exists |
+--------+-------------+---------------+--------+---------------+------------+---------+----------------------------+--------+-------------------------+
I tried to do a simple query:
SELECT listings.*
FROM listings
ORDER BY listings.id ASC
And that query also have "Using filesort;".
The fields "listings.id", "listingagents.id" and "ignore.id" are Primary Keys
The fields "listingagents.id" and "ignore.system_key" have indexes.
What can I do to improve the 1st query?
try to decrease listings range (currently 383360 rows) by adding some condition. e.g. id > x or limit.