MySQL doesn't use index as expected - mysql

EXPLAIN on this query
select v.type,sum(c.rank)
from
(select distinct power,color,type from vehicle) v
join configuration c using (power,color)
group by v.type
gives
+----+-------------+---------------+------------+-------+---------------+-------------+---------+-----------------------------------------+---------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+-------+---------------+-------------+---------+-----------------------------------------+---------+----------+---------------------------------+
| 1 | PRIMARY | configuration | NULL | ALL | veh | NULL | NULL | NULL | 76658 | 100.00 | Using temporary; Using filesort |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 6 | configuration.power,configuration.color | 65 | 100.00 | NULL |
| 2 | DERIVED | vehicle | NULL | index | cov | cov | 20 | NULL | 5058658 | 100.00 | Using index |
+----+-------------+---------------+------------+-------+---------------+-------------+---------+-----------------------------------------+---------+----------+---------------------------------+
The index on configuration (power,color) is not used even if I set force index
If I use a table instead of a subquery
create table tmp select distinct power,color,type from vehicle
then Explain on the 'same' query
select v.type,sum(c.rank)
from
tmp v
join configuration c using (power,color)
group by type
becomes
+----+-------------+---------------+------------+------+---------------+------+---------+---------------------+---------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+------+---------------+------+---------+---------------------+---------+----------+---------------------------------+
| 1 | SIMPLE | tmp | NULL | ALL | NULL | NULL | NULL | NULL | 1016144 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | configuration | NULL | ref | veh | veh | 6 | tmp.power,tmp.color | 2 | 100.00 | NULL |
+----+-------------+---------------+------------+------+---------------+------+---------+---------------------+---------+----------+---------------------------------+
and this is 4 times faster
How can I avoid using a hard table ?

In the first case the optimizer thinks it is better to do it the other way around, by using the auto generated key in the derived table.
In the second case there is no key in the temp table, so the best plan is to go for tmp first.
You should be able to force the table order by using STRAIGHT_JOIN instead of JOIN.

Related

Slow whereHas() after upgrade from Laravel 5.1 to Laravel 5.8

I switched an app from Laravel 5.1 to Laravel 5.8 by setting up a fresh 5.8 project and copying over the files, making some adjustments here and there.
The issue is that the queries with whereHas have become extremely slow.
Here is an example code:
Article::whereHas('categories', function ($category) {
$category->where('link', 'foto');
})
->active()
->recent()
->take(3)
->get();
This code generates the following query on Laravel 5.1 and completes in 0.05-0.07 seconds.
SELECT *
FROM `articles`
WHERE `articles`.`deleted_at` IS NULL
AND
(SELECT count(*)
FROM `categories`
INNER JOIN `article_category`
ON `categories`.`id` = `article_category`.`category_id`
WHERE `article_category`.`article_id` = `articles`.`id`
AND `link` = 'foto'
AND `categories`.`deleted_at` IS NULL) >= 1
ORDER BY IFNULL(published_at, created_at) DESC
LIMIT 3
and here's the explain:
+------+--------------------+------------------+------+--------------------------------------------------------------------------+-------------------------------------+---------+-----------------+------+----------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+--------------------+------------------+------+--------------------------------------------------------------------------+-------------------------------------+---------+-----------------+------+----------+------------------------------------+
| 1 | PRIMARY | articles | ALL | NULL | NULL | NULL | NULL | 4846 | 100.00 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | categories | ref | PRIMARY,categories_link_index | categories_link_index | 767 | const | 1 | 100.00 | Using index condition; Using where |
| 2 | DEPENDENT SUBQUERY | article_category | ref | article_category_category_id_foreign,article_category_article_id_foreign | article_category_article_id_foreign | 4 | lcf.articles.id | 1 | 100.00 | Using where |
+------+--------------------+------------------+------+--------------------------------------------------------------------------+-------------------------------------+---------+-----------------+------+----------+------------------------------------+
While on Laravel 5.8 it generates the following query that runs 10-13 seconds.
SELECT *
FROM `articles`
WHERE EXISTS
(SELECT *
FROM `categories`
INNER JOIN `article_category`
ON `categories`.`id` = `article_category`.`category_id`
WHERE `articles`.`id` = `article_category`.`article_id`
AND `link` = 'foto'
AND `categories`.`deleted_at` IS NULL)
AND `articles`.`deleted_at` IS NULL
ORDER BY IFNULL(published_at, created_at) DESC
LIMIT 3
and here's the explain
+------+--------------+------------------+------+--------------------------------------------------------------------------+--------------------------------------+---------+-------------------+------+----------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+--------------+------------------+------+--------------------------------------------------------------------------+--------------------------------------+---------+-------------------+------+----------+------------------------------------+
| 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 107 | 100.00 | Using temporary; Using filesort |
| 1 | PRIMARY | articles | ALL | PRIMARY | NULL | NULL | NULL | 4846 | 75.01 | Using where |
| 2 | MATERIALIZED | categories | ref | PRIMARY,categories_link_index | categories_link_index | 767 | const | 1 | 100.00 | Using index condition; Using where |
| 2 | MATERIALIZED | article_category | ref | article_category_category_id_foreign,article_category_article_id_foreign | article_category_category_id_foreign | 4 | lcf.categories.id | 107 | 100.00 | |
+------+--------------+------------------+------+--------------------------------------------------------------------------+--------------------------------------+---------+-------------------+------+----------+------------------------------------+
I ran both codebases on the same server, same MariaDB 10.2.24 database. The dataset size is approximately 6k articles, 80 categories and 10k records in the pivot.
What should I do here? So far I have discovered a bit more than 10 queries suffering from this problem in the codebase. Can I somehow flip a switch in config and make them all check the existence using the old way? Or should I somehow instruct every query to improve their plan?
UPDATE
I just noticed that if I use whereHas(..., '>', 0) I can get almost the old query (actually WHERE (SELECT COUNT...) > 0) with the old performance. However, whereHas(..., '>=', 1) does reduce itself to query with EXISTS. A question remains whether I could switch this behaviour over whole app without editing each query.
ANSWERS TO COMMENTS
Indexes onr articles
+----------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| articles | 0 | PRIMARY | 1 | id | A | 4846 | NULL | NULL | | BTREE | | |
| articles | 1 | articles_author_id_foreign | 1 | author_id | A | 18 | NULL | NULL | YES | BTREE | | |
+----------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Indexes on article_category
+------------------+------------+--------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------------+------------+--------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| article_category | 0 | PRIMARY | 1 | id | A | 9676 | NULL | NULL | | BTREE | | |
| article_category | 1 | article_category_category_id_foreign | 1 | category_id | A | 90 | NULL | NULL | | BTREE | | |
| article_category | 1 | article_category_article_id_foreign | 1 | article_id | A | 9676 | NULL | NULL | | BTREE | | |
+------------------+------------+--------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
The data to run the examples can be found here: https://gist.github.com/tontonsb/b97bc33066a67e9d8bc3654f2c01c103
This runs faster, but it's still 2.8 vs 0.07 seconds so the problem can be clearly seen, at least on MariaDB 10.2.24. Probably the speed improved because I have removed other columns and their indices.
Try this:
mpyw/eloquent-has-by-non-dependent-subquery: Convert has() and whereHas() constraints to non-dependent subqueries.
$articles = Article::query()
->hasByNonDependentSubquery('categories', function ($category) {
$category->where('link', 'foto');
})
->active()
->recent()
->take(3)
->get();

MySQL 2nd WHERE clause gives slow (Unending) result

I have a query which joins 3 tables (one inner join, one left join).
I need 2 WHERE conditions.
If I use either condition by itself, the search is quick. If I use both, the search will not complete.
I don't think I can add the search clause directly to the join as I don't want to exclude results where the search term is not present in that join.
SELECT job_entry.Job_Number,
job_entry.subject_ref,
contacts_library.company
FROM job_entry
LEFT JOIN multi_part_wind_instructions
ON job_entry.Job_Number = multi_part_wind_instructions.job_number
INNER JOIN contacts_library
ON job_entry.ContactID = contacts_library.ContactID
WHERE
company LIKE '%example%'
OR
multi_part_wind_instructions.address LIKE '%example%'
LIMIT 10
Explain Results query with both WHERE conditions:
+----+-------------+------------------------------+--------+---------------+---------+---------+-----------------------------+-------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------------------+--------+---------------+---------+---------+-----------------------------+-------+----------------------------------------------------+
| 1 | SIMPLE | job_entry | ALL | NULL | NULL | NULL | NULL | 16234 | Using where |
| 1 | SIMPLE | contacts_library | eq_ref | PRIMARY | PRIMARY | 4 | euroims.job_entry.ContactID | 1 | Using index condition; Using where |
| 1 | SIMPLE | multi_part_wind_instructions | ALL | NULL | NULL | NULL | NULL | 39447 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+------------------------------+--------+---------------+---------+---------+-----------------------------+-------+----------------------------------------------------+
Explain results from single WHERE condition (quick):
+----+-------------+------------------------------+--------+---------------+---------+---------+-----------------------------+-------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------------------+--------+---------------+---------+---------+-----------------------------+-------+----------------------------------------------------+
| 1 | SIMPLE | job_entry | ALL | NULL | NULL | NULL | NULL | 16234 | Using where |
| 1 | SIMPLE | contacts_library | eq_ref | PRIMARY | PRIMARY | 4 | euroims.job_entry.ContactID | 1 | Using index condition; Using where |
| 1 | SIMPLE | multi_part_wind_instructions | ALL | NULL | NULL | NULL | NULL | 39447 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+------------------------------+--------+---------------+---------+---------+-----------------------------+-------+----------------------------------------------------+

How to index to avoid full table scan?

How can I index the following query to avoid the full table scan?
explain SELECT fld1, fld2 FROM tablename WHERE IdReceived > 0;
+----+-------------+------------------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | tablename | ALL |IdReceived _idx| NULL | NULL | NULL | 99617 | Using where |
+----+-------------+------------------+------+---------------+------+---------+------+-------+-------------+
I have modified the query as bellow then also I can see row id2 (UNION) is going for full table scan.
explain SELECT fld1,fld2 FROM tablename WHERE IdReceived=1 UNION SELECT fld1,fld2 FROM tablename WHERE IdReceived>=1;
+----+--------------+------------------+------+---------------+--------------+---------+-------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------------+------+---------------+--------------+---------+-------+-------+-------------+
| 1 | PRIMARY | tablename | ref | IdReceived _idx | IdReceived _idx | 4 | const | 8865 | |
| 2 | UNION | tablename | ALL | IdReceived _idx | NULL | NULL | NULL | 99617 | Using where |
| NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+------------------+------+---------------+--------------+---------+-------+-------+-------------+
Since you are comparing the indexed column with the constant value,try to avoid that.
Refer here: http://dev.mysql.com/doc/refman/5.0/en/where-optimizations.html
Also I suggest a non_clustered index on fld1,fld2 to make this query perform faster

Strange mysql issue with query optimizer

I have 2 mysql servers. 1 node is master and the other acting as slave, replicating from the master
The 2 nodes have identical data and schema.
However, 1 particular query is executed differently from mysql when run on both nodes
query
EXPLAIN SELECT t.*, COUNT(h.id)
FROM tags t
INNER JOIN tags2articles s
ON t.id = s.tag_id
INNER JOIN tag_hits h
ON h.id = s.tag_id
INNER JOIN articles art
ON art.id = s.`article_id`
WHERE art.source_id IN (SELECT id FROM feeds WHERE source_id = 15074)
AND time_added > DATE_SUB(NOW(), INTERVAL 1 DAY)
AND t.type = '1'
GROUP BY t.id
HAVING COUNT(h.id) > 4
ORDER BY COUNT(h.id) DESC
LIMIT 15
Below is the outpout from EXPLAIN query run on both nodes. Note that the master node is outputting
the correct one
output on master node
+----+--------------------+-------+-----------------+-----------------------------+---------------------+---------+----------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-----------------+-----------------------------+---------------------+---------+----------------+--------+----------------------------------------------+
| 1 | PRIMARY | art | ALL | PRIMARY | NULL | NULL | NULL | 100270 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | s | ref | PRIMARY,FK_tags2articles | FK_tags2articles | 4 | art.id | 12 | Using index |
| 1 | PRIMARY | h | ref | tags_hits_idx | tags_hits_idx | 4 | s.tag_id | 1 | Using index |
| 1 | PRIMARY | t | eq_ref | PRIMARY,tags_type_idx | PRIMARY | 4 | s.tag_id | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | feeds | unique_subquery | PRIMARY,f_source_id_idx | PRIMARY | 4 | func | 1 | Using where |
+----+--------------------+-------+-----------------+-----------------------------+---------------------+---------+----------------+--------+----------------------------------------------+
output on slave node
+----+--------------------+-------+-----------------+-----------------------------+------------------+---------+--------------------+--------+----------------------------------------
------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-----------------+-----------------------------+------------------+---------+--------------------+--------+----------------------------------------------+
| 1 | PRIMARY | t | ref | PRIMARY,tags_type_idx | tags_type_idx | 2 | const | 206432 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | h | ref | tags_hits_idx | tags_hits_idx | 4 | t.id | 1 | Using index |
| 1 | PRIMARY | s | ref | PRIMARY,FK_tags2articles | PRIMARY | 4 | h.id | 2 | Using where; Using index |
| 1 | PRIMARY | art | eq_ref | PRIMARY | PRIMARY | 4 | s.article_id | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | feeds | unique_subquery | PRIMARY,f_source_id_idx | PRIMARY | 4 | func | 1 | Using where |
+----+--------------------+-------+-----------------+-----------------------------+------------------+---------+--------------------+--------+----------------------------------------------+
I cannot understand why this discrepancy exists. Any help?
Thanks
They can have different statistics for indexes / keys and that causes differences in index usage. If possible (locks table, so not always recommended) run ANALYZE TABLE for all participating tables and then query plan is likely same.

MySQL query not taking advantage of index

I was analizing a query (working on a wordpress plugin named nextgen gallery), this is what I got
query:
EXPLAIN
SELECT title, filename
FROM wp_ngg_pictures wnp
LEFT JOIN wp_ngg_gallery wng
ON wng.gid = wnp.galleryid
GROUP BY wnp.galleryid
LIMIT 5
result:
+----+-------------+-------+--------+---------------+---------+---------+-----------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------+------+---------------------------------+
| 1 | SIMPLE | wnp | ALL | NULL | NULL | NULL | NULL | 439 | Using temporary; Using filesort |
| 1 | SIMPLE | wng | eq_ref | PRIMARY | PRIMARY | 8 | web1db1.wnp.galleryid | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------+------+---------------------------------+
so I do:
ALTER TABLE wp_ngg_pictures ADD INDEX(galleryid);
and on my local test system I get:
+----+-------------+-------+--------+---------------+-----------+---------+--------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+-----------+---------+--------------------+------+-------+
| 1 | SIMPLE | wnp | index | galleryid | galleryid | 8 | NULL | 30 | |
| 1 | SIMPLE | wng | eq_ref | PRIMARY | PRIMARY | 8 | test.wnp.galleryid | 1 | |
+----+-------------+-------+--------+---------------+-----------+---------+--------------------+------+-------+
which seems fine, but on the final server I get
+----+-------------+-------+--------+---------------+-----------+---------+-----------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+-----------+---------+-----------------------+------+-------+
| 1 | SIMPLE | wnp | index | galleryid | galleryid | 8 | NULL | 439 | |
| 1 | SIMPLE | wng | eq_ref | PRIMARY | PRIMARY | 8 | web1db1.wnp.galleryid | 1 | |
+----+-------------+-------+--------+---------------+-----------+---------+-----------------------+------+-------+
so the index is used but all the rows are scanned anyway? Why is this happening?
Only difference I can see is mysql version which is 5.1.47 (local) vs 5.0.45 (remote), data is the same on both systems.
The rows column in the EXPLAIN SELECT output is an estimate of the number of rows that MySQL believes it must examine to execute the query, so I guess it is possible that your local version (5.1.47) is better at estimating than your remote version.
Without the EXPLAIN clause, do both queries produce the same output? What happens if you change the query to use a STRAIGHT_JOIN?