I am running the following query:
SELECT
MyField,
COUNT(*) AS MyCount
FROM
MyTable
NATURAL JOIN
AnotherTable
WHERE
Timestamp >= 1000 AND Timestamp <= 10000
GROUP BY
MyField
ORDER BY
MyCount DESC;
This runs fine and takes about 6 seconds to complete. If I want to limit the result to show only the 20 highest MyCounts, I add LIMIT 20 on to the end of the query. Suddenly it takes 6 minutes to complete!
The EXPLAIN output for the original query:
+----+-------------+-------------+--------+---------------------------+---------+---------+---------------------------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+--------+---------------------------+---------+---------+---------------------------+---------+----------------------------------------------+
| 1 | SIMPLE | MyTable | ALL | mytable_fkey | NULL | NULL | NULL | 6858209 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | AnotherTable| eq_ref | PRIMARY | PRIMARY | 4 | test.MyTable.FKeyID | 1 | Using index |
+----+-------------+-------------+--------+---------------------------+---------+---------+---------------------------+---------+----------------------------------------------+
The EXPLAIN output for the query with LIMIT 20:
+----+-------------+-------------+--------+---------------------------+-------------------------+---------+---------------------------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+--------+---------------------------+-------------------------+---------+---------------------------+---------+----------------------------------------------+
| 1 | SIMPLE | MyTable | index | mytable_fkey | myfield_timestamp_index | 771 | NULL | 6858209 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | AnotherTable| eq_ref | PRIMARY | PRIMARY | 4 | test.MyTable.FKeyID | 1 | Using index |
+----+-------------+-------------+--------+---------------------------+-------------------------+---------+---------------------------+---------+----------------------------------------------+
What is the explanation for this? Is there a better way I can limit the number of rows?
If you see Using temporary; Using filesort in your EXPLAIN output, you are probably missing a suitable index and you're getting killed because of it.
Make sure your JOIN and GROUP BY fields are both available in the same index.
Related
I'm looking into an issue with slow queries, and I can't figure out why using LIMIT 20 on this query makes it a factor 100 slower.
The query is as follows:
SELECT DISTINCT e.eventid,e.clock,e.ns,e.objectid,e.acknowledged,er1.r_eventid
FROM events e JOIN event_recovery er1 ON er1.eventid=e.eventid
WHERE e.source='0' AND e.object='0' AND e.objectid='115779'
AND e.eventid<='65535859' AND e.value='1'
ORDER BY e.eventid
LIMIT 20;
EXPLAIN:
Without LIMIT 20:
EXPLAIN SELECT DISTINCT e.eventid... ORDER BY e.eventid;
+------+-------------+-------+--------+---------------------------+----------+---------+-------------------------+-------+---------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+--------+---------------------------+----------+---------+-------------------------+-------+---------------------------------------------------------------------+
| 1 | SIMPLE | e | ref | PRIMARY,events_1,events_2 | events_1 | 16 | const,const,const | 15792 | Using index condition; Using where; Using temporary; Using filesort |
| 1 | SIMPLE | er1 | eq_ref | PRIMARY | PRIMARY | 8 | zabbix_server.e.eventid | 1 | |
+------+-------------+-------+--------+---------------------------+----------+---------+-------------------------+-------+---------------------------------------------------------------------+
(4256 rows in set (0.050 sec) when executed without EXPLAIN)
With LIMIT 20:
EXPLAIN SELECT DISTINCT e.eventid... ORDER BY e.eventid LIMIT 20;
+------+-------------+-------+--------+---------------------------+---------+---------+-------------------------+---------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+--------+---------------------------+---------+---------+-------------------------+---------+------------------------------+
| 1 | SIMPLE | e | range | PRIMARY,events_1,events_2 | PRIMARY | 8 | NULL | 2969589 | Using where; Using temporary |
| 1 | SIMPLE | er1 | eq_ref | PRIMARY | PRIMARY | 8 | zabbix_server.e.eventid | 1 | |
+------+-------------+-------+--------+---------------------------+---------+---------+-------------------------+---------+------------------------------+
(20 rows in set (9.269 sec) when executed without EXPLAIN)
Why is it not using the keys for the LIMIT 20?
I'm using MariaDB 10.3.11 on Debian 9
EDIT:
Forcing it to use the INDEX makes it fast, but why doesn't the MySQL optimizer realise this?
EXPLAIN SELECT DISTINCT e.eventid,e.clock,e.ns,e.objectid,e.acknowledged,er1.r_eventid FROM events e USE INDEX (events_1) JOIN event_recovery er1 ON er1.eventid=e.eventid WHERE e.source='0' AND e.object='0' AND e.objectid='115779' AND e.eventid<='65535859' AND e.value='1' ORDER BY e.eventid LIMIT 20;
+------+-------------+-------+--------+---------------+----------+---------+-------------------------+-------+---------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+--------+---------------+----------+---------+-------------------------+-------+---------------------------------------------------------------------+
| 1 | SIMPLE | e | ref | events_1 | events_1 | 16 | const,const,const | 15820 | Using index condition; Using where; Using temporary; Using filesort |
| 1 | SIMPLE | er1 | eq_ref | PRIMARY | PRIMARY | 8 | zabbix_server.e.eventid | 1 | |
+------+-------------+-------+--------+---------------+----------+---------+-------------------------+-------+---------------------------------------------------------------------+
Query
SELECT SQL_NO_CACHE contacts.id,
contacts.date_modified contacts__date_modified
FROM contacts
INNER JOIN
(SELECT tst.team_set_id
FROM team_sets_teams tst
INNER JOIN team_memberships team_membershipscontacts ON (team_membershipscontacts.team_id = tst.team_id)
AND (team_membershipscontacts.user_id = '5daa2e92-c347-11e9-afc5-525400a80916')
AND (team_membershipscontacts.deleted = 0)
GROUP BY tst.team_set_id) contacts_tf ON contacts_tf.team_set_id = contacts.team_set_id
LEFT JOIN contacts_cstm contacts_cstm ON contacts_cstm.id_c = contacts.id
WHERE contacts.deleted = 0
ORDER BY contacts.date_modified DESC,
contacts.id DESC
LIMIT 21;
Takes extremely long (2 minutes on 2M records). I cant change this query, since it is system generated
This is it's explain:
+----+-------------+--------------------------+------------+--------+-------------------------------------------------------------------------------------------------------+----------------------------+---------+-------------------------------------------+---------+----------+---------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------------------+------------+--------+-------------------------------------------------------------------------------------------------------+----------------------------+---------+-------------------------------------------+---------+----------+---------------------------------------------------------------------+
| 1 | PRIMARY | contacts | NULL | ref | idx_contacts_tmst_id,idx_del_date_modified,idx_contacts_del_last,idx_cont_del_reports,idx_del_id_user | idx_del_date_modified | 2 | const | 1113718 | 100.00 | Using temporary; Using filesort |
| 1 | PRIMARY | <derived3> | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 50.00 | Using where; Using join buffer (Block Nested Loop) |
| 1 | PRIMARY | contacts_cstm | NULL | eq_ref | PRIMARY | PRIMARY | 144 | sugarcrm.contacts.id | 1 | 100.00 | Using index |
| 3 | DERIVED | team_membershipscontacts | NULL | ref | idx_team_membership,idx_teammemb_team_user,idx_del_team_user | idx_team_membership | 145 | const | 2 | 99.36 | Using index condition; Using where; Using temporary; Using filesort |
| 3 | DERIVED | tst | NULL | ref | idx_ud_set_id,idx_ud_team_id,idx_ud_team_set_id,idx_ud_team_id_team_set_id | idx_ud_team_id_team_set_id | 144 | sugarcrm.team_membershipscontacts.team_id | 1 | 100.00 | Using index |
+----+-------------+--------------------------+------------+--------+-------------------------------------------------------------------------------------------------------+----------------------------+---------+-------------------------------------------+---------+----------+---------------------------------------------------------------------+
But when I use force index(idx_del_date_modified) (which is the same index used in explain), the query takes just 0.01s and I get slightly different explain.
+----+-------------+--------------------------+------------+--------+----------------------------------------------------------------------------+----------------------------+---------+-------------------------------------------+---------+----------+---------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------------------+------------+--------+----------------------------------------------------------------------------+----------------------------+---------+-------------------------------------------+---------+----------+---------------------------------------------------------------------+
| 1 | PRIMARY | contacts | NULL | ref | idx_del_date_modified | idx_del_date_modified | 2 | const | 1113718 | 100.00 | Using where |
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 50.00 | Using where |
| 1 | PRIMARY | contacts_cstm | NULL | eq_ref | PRIMARY | PRIMARY | 144 | sugarcrm.contacts.id | 1 | 100.00 | Using index |
| 2 | DERIVED | team_membershipscontacts | NULL | ref | idx_team_membership,idx_teammemb_team_user,idx_del_team_user | idx_team_membership | 145 | const | 2 | 99.36 | Using index condition; Using where; Using temporary; Using filesort |
| 2 | DERIVED | tst | NULL | ref | idx_ud_set_id,idx_ud_team_id,idx_ud_team_set_id,idx_ud_team_id_team_set_id | idx_ud_team_id_team_set_id | 144 | sugarcrm.team_membershipscontacts.team_id | 1 | 100.00 | Using index |
+----+-------------+--------------------------+------------+--------+----------------------------------------------------------------------------+----------------------------+---------+-------------------------------------------+---------+----------+---------------------------------------------------------------------+
The first query uses temporary table and file sort, but the query with force index uses just where. Shouldn't the queries be the same? Why is the query with force index so much faster - used index is still the same.
According to MySQL manual:
Temporary tables can be created under conditions such as these:
If there is an ORDER BY clause and a different GROUP BY clause, or if
the ORDER BY or GROUP BY contains columns from tables other than the
first table in the join queue, a temporary table is created.
DISTINCT combined with ORDER BY may require a temporary table.
If you use the SQL_SMALL_RESULT option, MySQL uses an in-memory
temporary table, unless the query also contains elements (described
later) that require on-disk storage.
Likely, you have better performance because in MySQL there is the query optimizer component.
If you create index the query optimizer could not use the index column even though the index exists.
Using force index(..) you are forcing MySql to use index, instead.
Please consider a detailed example here.
I'm investigating some long running queries in my PRODUCTION mysql 5.7 database. 1 particular query is taking over 60 seconds.
My usual approach is to take a dump of the data from PROD, import it into a DEV database, reproduce the issue, then analyse and try out some tweaks to the query.
However, the exact same query in DEV is taking less than a second.
Obviously, the mysql configuration, table structure, record numbers, etc are all the same as in PROD.
The query itself is a select with joins across 3 tables with a where clause on each table; 2 of the tables have approx 15m records in them. My initial suspicion was the lack of indexes on the queried columns, but the fact that in DEV it runs very fast would appear to disprove that.
What can I do to shed some light on this?
EXPLAIN results of my query:
PROD
EXPLAIN select this_.id as y0_ from event this_ inner join member m1_ on this_.member_id=m1_.id inner join event_type et2_ on this_.type_id=et2_.id where m1_.submission_id=40646 and this_.status in ('SUPPRESSED') and et2_.name in ('Salary') order by m1_.ni_number asc, m1_.ident1 asc, m1_.ident2 asc, m1_.ident3 asc, m1_.id asc, et2_.name asc limit 15;
+----+-------------+-------+------------+--------+-------------------------------------+-------------------+---------+--------------------------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+-------------------------------------+-------------------+---------+--------------------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | et2_ | NULL | ALL | PRIMARY | NULL | NULL | NULL | 17 | 10.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | this_ | NULL | ref | FK5C6729A2434DA80,FK5C6729AE4E22C6E | FK5C6729AE4E22C6E | 8 | iconnect.et2_.id | 4166 | 10.00 | Using where |
| 1 | SIMPLE | m1_ | NULL | eq_ref | PRIMARY,IND_submission_id | PRIMARY | 8 | iconnect.this_.member_id | 1 | 5.00 | Using where |
+----+-------------+-------+------------+--------+-------------------------------------+-------------------+---------+--------------------------+------+----------+----------------------------------------------+
3 rows in set, 1 warning (0.00 sec)
DEV
EXPLAIN select this_.id as y0_ from event this_ inner join member m1_ on this_.member_id=m1_.id inner join event_type et2_ on this_.type_id=et2_.id where m1_.submission_id=40646 and this_.status in ('SUPPRESSED') and et2_.name in ('Salary') order by m1_.ni_number asc, m1_.ident1 asc, m1_.ident2 asc, m1_.ident3 asc, m1_.id asc, et2_.name asc limit 15;
+----+-------------+-------+------------+------+-------------------------------------+-------------------+---------+-----------------+-------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+-------------------------------------+-------------------+---------+-----------------+-------+----------+----------------------------------------------+
| 1 | SIMPLE | et2_ | NULL | ALL | PRIMARY | NULL | NULL | NULL | 17 | 10.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | m1_ | NULL | ref | PRIMARY,IND_submission_id | IND_submission_id | 8 | const | 26644 | 100.00 | NULL |
| 1 | SIMPLE | this_ | NULL | ref | FK5C6729A2434DA80,FK5C6729AE4E22C6E | FK5C6729A2434DA80 | 8 | iconnect.m1_.id | 2 | 1.86 | Using where |
+----+-------------+-------+------------+------+-------------------------------------+-------------------+---------+-----------------+-------+----------+----------------------------------------------+
3 rows in set, 1 warning (0.03 sec)
Have also spotted that the Cardinality of some of indexes accessed by this query are massively different between DEV and PROD:
FK5C6729AE4E22C6E: DEV=9, PROD=3792
IND_submission_id: DEV=2490, PROD=74220
Could this be impacting performance in PROD?
Query inefficiencies down to the tables containing more data than the index pages can hold. Increasing
innodb_stats_persistent_sample_pages
from 20 to 100, then running ANALYZE TABLE changed the execution plan for the query to be as expected, then running the query took less than 1 second.
MySQL version 5.7.17
When I use JOIN for small tables, MySQL don't allow me to use index on the main table.
Compare:
EXPLAIN SELECT `ko_base_client_orders`.*
FROM `ko_base_client_orders`
LEFT JOIN `ko_base_new_perspe` AS `ko_of`
ON (`ko_of`.`id` = `ko_base_client_orders`.`office`)
ORDER BY `order_time` DESC
LIMIT 10 OFFSET 0;
+----+-------------+-----------------------+------------+-------+---------------+---------+---------+------+-------+----------+-----------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------+------------+-------+---------------+---------+---------+------+-------+----------+-----------------------------------------------------------------+
| 1 | SIMPLE | ko_base_client_orders | NULL | ALL | NULL | NULL | NULL | NULL | 60737 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | ko_of | NULL | index | PRIMARY | PRIMARY | 4 | NULL | 2 | 100.00 | Using where; Using index; Using join buffer (Block Nested Loop) |
+----+-------------+-----------------------+------------+-------+---------------+---------+---------+------+-------+----------+-----------------------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
and the same query with index hint:
EXPLAIN SELECT `ko_base_client_orders`.*
FROM `ko_base_client_orders`
LEFT JOIN `ko_base_new_perspe` AS `ko_of`
FORCE INDEX FOR JOIN (`PRIMARY`)
ON (`ko_of`.`id` = `ko_base_client_orders`.`office`)
ORDER BY `order_time` DESC
LIMIT 10 OFFSET 0;
+----+-------------+-----------------------+------------+--------+---------------+------------+---------+---------------------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------+------------+--------+---------------+------------+---------+---------------------------------+------+----------+-------------+
| 1 | SIMPLE | ko_base_client_orders | NULL | index | NULL | order_time | 6 | NULL | 10 | 100.00 | NULL |
| 1 | SIMPLE | ko_of | NULL | eq_ref | PRIMARY | PRIMARY | 4 | e3.ko_base_client_orders.office | 1 | 100.00 | Using index |
+----+-------------+-----------------------+------------+--------+---------------+------------+---------+---------------------------------+------+----------+-------------+
In second case MySQL uses key for order_time, but in first example it ignores index and scans the entire table.
I can not predict which table would be small, and which would not (it differs from project to project), so I don't want to use index hints all the time. Is there another solution for this problem?
I have a query :
SELECT listings.*, listingagents.agentid
FROM listings
LEFT JOIN listingagents ON (listingagents.id = listings.listingagentid)
LEFT JOIN ignore ON (ignore.system_key = listings.listingid)
WHERE ignore.id IS NULL
ORDER BY listings.id ASC
I am trying to improve the performance of this query since it is very slow and it is putting a heavy load on the MySQL server.
When I do a mysql explain, output shows :
+--------+-------------+---------------+--------+---------------+------------+---------+----------------------------+--------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+--------+-------------+---------------+--------+---------------+------------+---------+----------------------------+--------+-------------------------+
| 1 | SIMPLE | listings | ALL | NULL | NULL | NULL | NULL | 383360 | Using filesort |
| 1 | SIMPLE | listingagents | eq_ref | PRIMARY | PRIMARY | 4 | db.listings.listingagen... | 1 | |
| 1 | SIMPLE | ignore | ref | system_key | system_key | 1 | const | 404 | Using where; Not exists |
+--------+-------------+---------------+--------+---------------+------------+---------+----------------------------+--------+-------------------------+
I tried to do a simple query:
SELECT listings.*
FROM listings
ORDER BY listings.id ASC
And that query also have "Using filesort;".
The fields "listings.id", "listingagents.id" and "ignore.id" are Primary Keys
The fields "listingagents.id" and "ignore.system_key" have indexes.
What can I do to improve the 1st query?
try to decrease listings range (currently 383360 rows) by adding some condition. e.g. id > x or limit.