I am trying to improve performance for an application. I might need to create summary tables that run on cron so the app doesn't take as long to load (5-10 seconds). Is that the best idea?
Given the following table:
mysql> describe school_data_sets_numeric_data;
+--------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| data_set_nid | int(11) | NO | MUL | NULL | |
| school_nid | int(11) | NO | MUL | NULL | |
| year | int(11) | NO | MUL | NULL | |
| description | varchar(255) | NO | | NULL | |
| value | decimal(18,5) | NO | | NULL | |
+--------------+---------------+------+-----+---------+----------------+
6 rows in set (0.00 sec)
And the following queries (run once for each data_set_nid for a school)
This query runs fast (0 seconds):
SELECT year, description, CONCAT(FORMAT((value/(SELECT SUM(value)
FROM `school_data_sets_numeric_data` as numeric_data_inner
WHERE year = numeric_data_outer.year and data_set_nid = numeric_data_outer.data_set_nid and school_nid = numeric_data_outer.school_nid)) * 100, 2), '%') as value
FROM `school_data_sets_numeric_data` as numeric_data_outer
WHERE data_set_nid = 38251 and school_nid = 32805 ORDER BY id DESC;
Explain:
+----+--------------------+--------------------+------+---------------------------------------------+--------------+---------+-----------------------------------------------------------------------------------------------------------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------------+------+---------------------------------------------+--------------+---------+-----------------------------------------------------------------------------------------------------------+------+-----------------------------+
| 1 | PRIMARY | numeric_data_outer | ref | data_set_nid,data_set_nid_2,school_nid | data_set_nid | 8 | const,const | 17 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | numeric_data_inner | ref | year,data_set_nid,data_set_nid_2,school_nid | data_set_nid | 8 | rocdocs_main_drupal_7.numeric_data_outer.data_set_nid,rocdocs_main_drupal_7.numeric_data_outer.school_nid | 9 | Using where |
+----+--------------------+--------------------+------+---------------------------------------------+--------------+---------+-----------------------------------------------------------------------------------------------------------+------+-----------------------------+
This query runs slow (1.43 seconds):
SELECT year, description, CONCAT(FORMAT((SUM(value)/(SELECT SUM(value)
FROM `school_data_sets_numeric_data` as numeric_data_inner
WHERE year = numeric_data_outer.year and data_set_nid = numeric_data_outer.data_set_nid)) * 100, 2), '%') as value
FROM `school_data_sets_numeric_data` as numeric_data_outer
WHERE data_set_nid = 38251 GROUP BY year,description ORDER BY id DESC;
Explain:
+----+--------------------+--------------------+------+----------------------------------+----------------+---------+-------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------------+------+----------------------------------+----------------+---------+-------+-------+----------------------------------------------+
| 1 | PRIMARY | numeric_data_outer | ref | data_set_nid,data_set_nid_2 | data_set_nid_2 | 4 | const | 90640 | Using where; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | numeric_data_inner | ref | year,data_set_nid,data_set_nid_2 | year | 4 | func | 38871 | Using where |
+----+--------------------+--------------------+------+----------------------------------+----------------+---------+-------+-------+----------------------------------------------+
Correlated subqueries/subselects are often a bottelneck - partly due to the fact that MySql only has a nested loop join algorithm and no hash-joins/merge-joins.
I would try joining your main select to a derived table holding all the SUM values you need.
Related
I switched an app from Laravel 5.1 to Laravel 5.8 by setting up a fresh 5.8 project and copying over the files, making some adjustments here and there.
The issue is that the queries with whereHas have become extremely slow.
Here is an example code:
Article::whereHas('categories', function ($category) {
$category->where('link', 'foto');
})
->active()
->recent()
->take(3)
->get();
This code generates the following query on Laravel 5.1 and completes in 0.05-0.07 seconds.
SELECT *
FROM `articles`
WHERE `articles`.`deleted_at` IS NULL
AND
(SELECT count(*)
FROM `categories`
INNER JOIN `article_category`
ON `categories`.`id` = `article_category`.`category_id`
WHERE `article_category`.`article_id` = `articles`.`id`
AND `link` = 'foto'
AND `categories`.`deleted_at` IS NULL) >= 1
ORDER BY IFNULL(published_at, created_at) DESC
LIMIT 3
and here's the explain:
+------+--------------------+------------------+------+--------------------------------------------------------------------------+-------------------------------------+---------+-----------------+------+----------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+--------------------+------------------+------+--------------------------------------------------------------------------+-------------------------------------+---------+-----------------+------+----------+------------------------------------+
| 1 | PRIMARY | articles | ALL | NULL | NULL | NULL | NULL | 4846 | 100.00 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | categories | ref | PRIMARY,categories_link_index | categories_link_index | 767 | const | 1 | 100.00 | Using index condition; Using where |
| 2 | DEPENDENT SUBQUERY | article_category | ref | article_category_category_id_foreign,article_category_article_id_foreign | article_category_article_id_foreign | 4 | lcf.articles.id | 1 | 100.00 | Using where |
+------+--------------------+------------------+------+--------------------------------------------------------------------------+-------------------------------------+---------+-----------------+------+----------+------------------------------------+
While on Laravel 5.8 it generates the following query that runs 10-13 seconds.
SELECT *
FROM `articles`
WHERE EXISTS
(SELECT *
FROM `categories`
INNER JOIN `article_category`
ON `categories`.`id` = `article_category`.`category_id`
WHERE `articles`.`id` = `article_category`.`article_id`
AND `link` = 'foto'
AND `categories`.`deleted_at` IS NULL)
AND `articles`.`deleted_at` IS NULL
ORDER BY IFNULL(published_at, created_at) DESC
LIMIT 3
and here's the explain
+------+--------------+------------------+------+--------------------------------------------------------------------------+--------------------------------------+---------+-------------------+------+----------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+--------------+------------------+------+--------------------------------------------------------------------------+--------------------------------------+---------+-------------------+------+----------+------------------------------------+
| 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 107 | 100.00 | Using temporary; Using filesort |
| 1 | PRIMARY | articles | ALL | PRIMARY | NULL | NULL | NULL | 4846 | 75.01 | Using where |
| 2 | MATERIALIZED | categories | ref | PRIMARY,categories_link_index | categories_link_index | 767 | const | 1 | 100.00 | Using index condition; Using where |
| 2 | MATERIALIZED | article_category | ref | article_category_category_id_foreign,article_category_article_id_foreign | article_category_category_id_foreign | 4 | lcf.categories.id | 107 | 100.00 | |
+------+--------------+------------------+------+--------------------------------------------------------------------------+--------------------------------------+---------+-------------------+------+----------+------------------------------------+
I ran both codebases on the same server, same MariaDB 10.2.24 database. The dataset size is approximately 6k articles, 80 categories and 10k records in the pivot.
What should I do here? So far I have discovered a bit more than 10 queries suffering from this problem in the codebase. Can I somehow flip a switch in config and make them all check the existence using the old way? Or should I somehow instruct every query to improve their plan?
UPDATE
I just noticed that if I use whereHas(..., '>', 0) I can get almost the old query (actually WHERE (SELECT COUNT...) > 0) with the old performance. However, whereHas(..., '>=', 1) does reduce itself to query with EXISTS. A question remains whether I could switch this behaviour over whole app without editing each query.
ANSWERS TO COMMENTS
Indexes onr articles
+----------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| articles | 0 | PRIMARY | 1 | id | A | 4846 | NULL | NULL | | BTREE | | |
| articles | 1 | articles_author_id_foreign | 1 | author_id | A | 18 | NULL | NULL | YES | BTREE | | |
+----------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Indexes on article_category
+------------------+------------+--------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------------+------------+--------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| article_category | 0 | PRIMARY | 1 | id | A | 9676 | NULL | NULL | | BTREE | | |
| article_category | 1 | article_category_category_id_foreign | 1 | category_id | A | 90 | NULL | NULL | | BTREE | | |
| article_category | 1 | article_category_article_id_foreign | 1 | article_id | A | 9676 | NULL | NULL | | BTREE | | |
+------------------+------------+--------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
The data to run the examples can be found here: https://gist.github.com/tontonsb/b97bc33066a67e9d8bc3654f2c01c103
This runs faster, but it's still 2.8 vs 0.07 seconds so the problem can be clearly seen, at least on MariaDB 10.2.24. Probably the speed improved because I have removed other columns and their indices.
Try this:
mpyw/eloquent-has-by-non-dependent-subquery: Convert has() and whereHas() constraints to non-dependent subqueries.
$articles = Article::query()
->hasByNonDependentSubquery('categories', function ($category) {
$category->where('link', 'foto');
})
->active()
->recent()
->take(3)
->get();
I'm running this query
mysql> explain SELECT
recipients.id
FROM
recipients
JOIN recipient_contact_details ON recipient_contact_details.recipient_id = recipients.id
JOIN recipient_contact_preferences ON recipient_contact_preferences.recipient_id = recipients.id
LEFT JOIN recipient_has_recipient_tags ON recipient_has_recipient_tags.recipient_id = recipients.id
LEFT JOIN recipient_tags ON recipient_tags.id = recipient_has_recipient_tags.recipient_tag_id
LEFT JOIN recipient_tag_groups ON recipient_tag_groups.id = recipient_tags.recipient_tag_group_id
INNER JOIN location ON location.id = recipients.location_id
WHERE
1 = 1
AND FLOOR(
DATEDIFF(NOW(), recipients.dob) / 365
) > 15
AND recipients.`join_date` < '2016-02-27 16:35:46'
AND recipients.`last_attendance` > '2016-02-18 16:35:46'
AND location.deleted_at IS NULL
AND recipient_contact_details.type = 1
AND recipient_contact_details.
VALUE
!= '';
(I apologise for the length!) - It should return around 900+k rows, from a recipients table of 2.7+m records. Which, it does, but it takes around 25-30 seconds to run.
After running an explain I can see:
+----+-------------+-------------------------------+--------+------------------------------------------------------------------+------------------------------------------------------------------+---------+---------------------------------------------------------+-------+-----------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------------------+--------+------------------------------------------------------------------+------------------------------------------------------------------+---------+---------------------------------------------------------+-------+-----------------------------------------------------------------+
| 1 | SIMPLE | location | ALL | PRIMARY,location_id_index | NULL | NULL | NULL | 156 | Using where |
| 1 | SIMPLE | recipients | ref | PRIMARY,recipients_location_id_index | recipients_location_id_index | 5 | homestead.location.id | 17918 | Using index condition; Using where |
| 1 | SIMPLE | recipient_contact_preferences | ref | recipient_contact_preferences_recipient_id_index | recipient_contact_preferences_recipient_id_index | 4 | homestead.recipients.id | 1 | Using where; Using index |
| 1 | SIMPLE | recipient_has_recipient_tags | ref | recipient_has_recipient_tags_recipient_id_recipient_tag_id_index | recipient_has_recipient_tags_recipient_id_recipient_tag_id_index | 4 | homestead.recipients.id | 2 | Using where; Using index |
| 1 | SIMPLE | recipient_contact_details | ref | recipient_contact_details_recipient_id_index | recipient_contact_details_recipient_id_index | 4 | homestead.recipients.id | 2 | Using index condition; Using where |
| 1 | SIMPLE | recipient_tags | eq_ref | PRIMARY | PRIMARY | 4 | homestead.recipient_has_recipient_tags.recipient_tag_id | 1 | Using where |
| 1 | SIMPLE | recipient_tag_groups | index | PRIMARY | PRIMARY | 4 | NULL | 2 | Using where; Using index; Using join buffer (Block Nested Loop) |
+----+-------------+-------------------------------+--------+------------------------------------------------------------------+------------------------------------------------------------------+---------+---------------------------------------------------------+-------+-----------------------------------------------------------------+
7 rows in set (0.00 sec)
As you can see, I've already added (what I think are relevant indexes to the various tables) . The location table is
mysql> desc location;
+------------------+------------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+------------------+------+-----+---------------------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| created_at | timestamp | NO | | 0000-00-00 00:00:00 | |
| updated_at | timestamp | NO | | 0000-00-00 00:00:00 | |
| name | varchar(255) | NO | | NULL | |
| deleted_at | timestamp | YES | | NULL | |
| org_website | varchar(255) | NO | | NULL | |
| from_name | varchar(255) | NO | | NULL | |
| reply_to_address | varchar(255) | NO | | NULL | |
| logo_path | varchar(255) | NO | | NULL | |
| colour | varchar(255) | NO | | NULL | |
| street_address | varchar(255) | NO | | NULL | |
| city | varchar(255) | NO | | NULL | |
| region | varchar(255) | NO | | NULL | |
| postcode | varchar(255) | NO | | NULL | |
| country | varchar(255) | NO | | NULL | |
| privacy_url | varchar(255) | NO | | NULL | |
| remote_id | bigint(20) | NO | MUL | 0 | |
+------------------+------------------+------+-----+---------------------+----------------+
17 rows in set (0.00 sec)
I'm quite new to optimising queries for such a large result set. I can see that the location table is having issues, but I'm unsure as to what to change to make a difference. Any help is greatly appreciated.
Please create an index on recipients.dob
CREATE INDEX idx_recipients_dob ON recepients(dob);
and rewrite this:
AND FLOOR(
DATEDIFF(NOW(), recipients.dob) / 365
) > 15
to this:
AND recipients.dob < NOW() - INTERVAL 15 YEAR
I think, this might already solve all your problems.
The rewrite is necessary, because MySQL can't use an index if there's any calculation on the indexed column. Plus it's easier to read and more accurate (you're forgetting leap years).
And these joins
LEFT JOIN recipient_has_recipient_tags ON recipient_has_recipient_tags.recipient_id = recipients.id
LEFT JOIN recipient_tags ON recipient_tags.id = recipient_has_recipient_tags.recipient_tag_id
LEFT JOIN recipient_tag_groups ON recipient_tag_groups.id = recipient_tags.recipient_tag_group_id
are not necessary when you don't use these tables anyway.
my tables are simple:
mysql> desc muralentry ;
+-----------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_src_id | int(11) | NO | MUL | NULL | |
| content | longtext | NO | | NULL | |
+-----------------+------------------+------+-----+---------+----------------+
mysql> desc muralentry_user ;
+-----------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+----------------+
| muralentry_id | int(11) | NO | PRI | NULL | auto_increment |
| userinfo_id | int(11) | NO | MUL | NULL | |
+-----------------+------------------+------+-----+---------+----------------+
Im doing the following query:
SELECT DISTINCT *
FROM muralentry
LEFT OUTER JOIN muralentry_user ON (muralentry.id = muralentry_user.muralentry_id)
WHERE user_src_id = 1
The explain:
+----+-------------+----------------------------+------+-------------------------------------------------------+-------------------------------------+---------+------------------------------------------+------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------------+------+-------------------------------------------------------+-------------------------------------+---------+------------------------------------------+------+-----------------+
| 1 | SIMPLE | muralentry | ref | muralentry_99bd10ae | muralentry_99bd10ae | 4 | const | 686 | Using temporary |
| 1 | SIMPLE | muralentry_user | ref | muralentry_id,muralentry_user_bcd7114e | muralentry_user_bcd7114e | 4 | muralentry.id | 15 | |
+----+-------------+----------------------------+------+-------------------------------------------------------+-------------------------------------+---------+------------------------------------------+------+-----------------+
Good result (for me :D)
But, when i add another where clause:
SELECT DISTINCT *
FROM muralentry
LEFT OUTER JOIN muralentry_user ON (muralentry.id = muralentry_user.muralentry_id)
WHERE user_src_id = 1 OR userinfo_id = 1;
The explain:
+----+-------------+----------------------------+------+-------------------------------------------------------+-------------------------------------+---------+------------------------------------------+---------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------------+------+-------------------------------------------------------+-------------------------------------+---------+------------------------------------------+---------+-----------------+
| 1 | SIMPLE | muralentry | ALL | muralentry_99bd10ae | NULL | NULL | NULL | 1140932 | Using temporary |
| 1 | SIMPLE | muralentry_user | ref | muralentry_id,muralentry_user_bcd7114e | muralentry_user_bcd7114e | 4 | muralentry.id | 15 | Using where |
+----+-------------+----------------------------+------+-------------------------------------------------------+-------------------------------------+---------+------------------------------------------+---------+-----------------+
Wow... the result if ALOT worst...
How can i "fix" this?
Should i create some index to do this job? Or recreate my query?
I'm expecting the following result: 'muralentry' rows where the user is 'user_src_id' AND the 'muralentry_user' rows where he is 'userinfo_id'.
-- edit --
I edited the question because when I wrote an AND actually wanted an OR... sorry for that!
I found a very weird mysql behaviour : when I run a specific query twice, the explain of this query is different the second time :
query = SELECT `twstats_twwordstrend`.`id`, `twstats_twwordstrend`.`created`, `twstats_twwordstrend`.`freq`, `twstats_twwordstrend`.`word_id` FROM `twstats_twwordstrend` INNER JOIN `twstats_twwords` ON (`twstats_twwordstrend`.`word_id` = `twstats_twwords`.`id`) WHERE (`twstats_twwords`.`name` = '#ladygaga' AND `twstats_twwordstrend`.`created` > '2011-01-28 01:30:19' );
1st query execution and then run explain :
mysql> EXPLAIN SELECT `twstats_twwordstrend`.`id`, `twstats_twwordstrend`.`created`, `twstats_twwordstrend`.`freq`, `twstats_twwordstrend`.`word_id` FROM `twstats_twwordstrend` INNER JOIN `twstats_twwords` ON (`twstats_twwordstrend`.`word_id` = `twstats_twwords`.`id`) WHERE (`twstats_twwords`.`name` = '#ladygaga' AND `twstats_twwordstrend`.`created` > '2011-01-28 01:30:19' );
+----+-------------+----------------------+--------+-------------------------------+---------+---------+-------------------------------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+--------+-------------------------------+---------+---------+-------------------------------------------+---------+-------------+
| 1 | SIMPLE | twstats_twwordstrend | ALL | twstats_twwordstrend_4b95d890 | NULL | NULL | NULL | 4877401 | Using where |
| 1 | SIMPLE | twstats_twwords | eq_ref | PRIMARY | PRIMARY | 4 | statweestics.twstats_twwordstrend.word_id | 1 | Using where |
+----+-------------+----------------------+--------+-------------------------------+---------+---------+-------------------------------------------+---------+-------------+
2 rows in set (0.00 sec)
2nd query execution and then run explain :
mysql> EXPLAIN SELECT `twstats_twwordstrend`.`id`, `twstats_twwordstrend`.`created`, `twstats_twwordstrend`.`freq`, `twstats_twwordstrend`.`word_id` FROM `twstats_twwordstrend` INNER JOIN `twstats_twwords` ON (`twstats_twwordstrend`.`word_id` = `twstats_twwords`.`id`) WHERE (`twstats_twwords`.`name` = '#ladygaga' AND `twstats_twwordstrend`.`created` > '2011-01-28 01:30:19' );
+----+-------------+----------------------+------+-------------------------------+-------------------------------+---------+---------------------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+------+-------------------------------+-------------------------------+---------+---------------------------------+--------+-------------+
| 1 | SIMPLE | twstats_twwords | ALL | PRIMARY | NULL | NULL | NULL | 222994 | Using where |
| 1 | SIMPLE | twstats_twwordstrend | ref | twstats_twwordstrend_4b95d890 | twstats_twwordstrend_4b95d890 | 4 | statweestics.twstats_twwords.id | 15 | Using where |
+----+-------------+----------------------+------+-------------------------------+-------------------------------+---------+---------------------------------+--------+-------------+
2 rows in set (0.00 sec)
mysql> describe twstats_twwords;
+---------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| created | datetime | NO | | NULL | |
| name | varchar(140) | NO | | NULL | |
+---------+--------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
mysql> describe twstats_twwordstrend;
+---------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+----------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| created | datetime | NO | | NULL | |
| freq | double | NO | | NULL | |
| word_id | int(11) | NO | MUL | NULL | |
+---------+----------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
How this can be possible ??
Look at the rows column. The engine was able to gather more statistics -- so the next time it will try to use the better plan.
Happy coding.
My MySQL is not strong, so please forgive any rookie mistakes. Short version:
SELECT locId,count,avg FROM destAgg_geo is significantly slower than SELECT * from destAgg_geo
prtt.destAgg is a table keyed on dst_ip (PRIMARY)
mysql> describe prtt.destAgg;
+---------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+-------+
| dst_ip | int(10) unsigned | NO | PRI | 0 | |
| total | float unsigned | YES | | NULL | |
| avg | float unsigned | YES | | NULL | |
| sqtotal | float unsigned | YES | | NULL | |
| sqavg | float unsigned | YES | | NULL | |
| count | int(10) unsigned | YES | | NULL | |
+---------+------------------+------+-----+---------+-------+
geoip.blocks is a table keyed on both startIpNum and endIpNum (PRIMARY)
mysql> describe geoip.blocks;
+------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+-------+
| startIpNum | int(10) unsigned | NO | MUL | NULL | |
| endIpNum | int(10) unsigned | NO | | NULL | |
| locId | int(10) unsigned | NO | | NULL | |
+------------+------------------+------+-----+---------+-------+
destAgg_geo is a view:
CREATE VIEW destAgg_geo AS SELECT * FROM destAgg JOIN geoip.blocks
ON destAgg.dst_ip BETWEEN geoip.blocks.startIpNum AND geoip.blocks.endIpNum;
Here's the optimization plan for select *:
mysql> explain select * from destAgg_geo;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | blocks | ALL | start_end | NULL | NULL | NULL | 3486646 | |
| 1 | SIMPLE | destAgg | ALL | PRIMARY | NULL | NULL | NULL | 101893 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
Here's the optimization plan for select with specific columns:
mysql> explain select locId,count,avg from destAgg_geo;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | destAgg | ALL | PRIMARY | NULL | NULL | NULL | 101893 | |
| 1 | SIMPLE | blocks | ALL | start_end | NULL | NULL | NULL | 3486646 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
Here's the optimization plan for every column from destAgg and just the locId column from geoip.blocks:
mysql> explain select dst_ip,total,avg,sqtotal,sqavg,count,locId from destAgg_geo;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | blocks | ALL | start_end | NULL | NULL | NULL | 3486646 | |
| 1 | SIMPLE | destAgg | ALL | PRIMARY | NULL | NULL | NULL | 101893 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
Remove any column except dst_ip and the range check flips to blocks:
mysql> explain select dst_ip,avg,sqtotal,sqavg,count,locId from destAgg_geo;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | destAgg | ALL | PRIMARY | NULL | NULL | NULL | 101893 | |
| 1 | SIMPLE | blocks | ALL | start_end | NULL | NULL | NULL | 3486646 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
which is then much slower. What's going on here?
(Yes, I could just use the * query results and process from there, but I would like to know what's happening and why)
EDIT -- EXPLAIN on the VIEW query:
mysql> explain SELECT * FROM destAgg JOIN geoip.blocks ON destAgg.dst_ip BETWEEN geoip.blocks.startIpNum AND geoip.blocks.endIpNum;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | blocks | ALL | start_end | NULL | NULL | NULL | 3486646 | |
| 1 | SIMPLE | destAgg | ALL | PRIMARY | NULL | NULL | NULL | 101893 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
MySQL can tell you if you run EXPLAIN PLAN on both queries.
The first query with the columns doesn't include any key columns, so my guess is it has to do a TABLE SCAN.
The second query with the "SELECT *" includes the primary key, so it can use the index.
The range filter is applied last, so the problem is that the query optimizer is choosing to join the larger table first in one case, and the smaller table first in another. Perhaps someone with more knowledge of the optimizer can tell us why it's joining the tables in a different order for each.
I think the real goal here should be to try to get the JOIN to use an index, so the order of the join wouldn't matter so much.
I would try putting a compisite index on locId,count,avg and see if that doesn't improve speed.