My MySQL is not strong, so please forgive any rookie mistakes. Short version:
SELECT locId,count,avg FROM destAgg_geo is significantly slower than SELECT * from destAgg_geo
prtt.destAgg is a table keyed on dst_ip (PRIMARY)
mysql> describe prtt.destAgg;
+---------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+-------+
| dst_ip | int(10) unsigned | NO | PRI | 0 | |
| total | float unsigned | YES | | NULL | |
| avg | float unsigned | YES | | NULL | |
| sqtotal | float unsigned | YES | | NULL | |
| sqavg | float unsigned | YES | | NULL | |
| count | int(10) unsigned | YES | | NULL | |
+---------+------------------+------+-----+---------+-------+
geoip.blocks is a table keyed on both startIpNum and endIpNum (PRIMARY)
mysql> describe geoip.blocks;
+------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+-------+
| startIpNum | int(10) unsigned | NO | MUL | NULL | |
| endIpNum | int(10) unsigned | NO | | NULL | |
| locId | int(10) unsigned | NO | | NULL | |
+------------+------------------+------+-----+---------+-------+
destAgg_geo is a view:
CREATE VIEW destAgg_geo AS SELECT * FROM destAgg JOIN geoip.blocks
ON destAgg.dst_ip BETWEEN geoip.blocks.startIpNum AND geoip.blocks.endIpNum;
Here's the optimization plan for select *:
mysql> explain select * from destAgg_geo;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | blocks | ALL | start_end | NULL | NULL | NULL | 3486646 | |
| 1 | SIMPLE | destAgg | ALL | PRIMARY | NULL | NULL | NULL | 101893 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
Here's the optimization plan for select with specific columns:
mysql> explain select locId,count,avg from destAgg_geo;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | destAgg | ALL | PRIMARY | NULL | NULL | NULL | 101893 | |
| 1 | SIMPLE | blocks | ALL | start_end | NULL | NULL | NULL | 3486646 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
Here's the optimization plan for every column from destAgg and just the locId column from geoip.blocks:
mysql> explain select dst_ip,total,avg,sqtotal,sqavg,count,locId from destAgg_geo;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | blocks | ALL | start_end | NULL | NULL | NULL | 3486646 | |
| 1 | SIMPLE | destAgg | ALL | PRIMARY | NULL | NULL | NULL | 101893 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
Remove any column except dst_ip and the range check flips to blocks:
mysql> explain select dst_ip,avg,sqtotal,sqavg,count,locId from destAgg_geo;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | destAgg | ALL | PRIMARY | NULL | NULL | NULL | 101893 | |
| 1 | SIMPLE | blocks | ALL | start_end | NULL | NULL | NULL | 3486646 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
which is then much slower. What's going on here?
(Yes, I could just use the * query results and process from there, but I would like to know what's happening and why)
EDIT -- EXPLAIN on the VIEW query:
mysql> explain SELECT * FROM destAgg JOIN geoip.blocks ON destAgg.dst_ip BETWEEN geoip.blocks.startIpNum AND geoip.blocks.endIpNum;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | blocks | ALL | start_end | NULL | NULL | NULL | 3486646 | |
| 1 | SIMPLE | destAgg | ALL | PRIMARY | NULL | NULL | NULL | 101893 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
MySQL can tell you if you run EXPLAIN PLAN on both queries.
The first query with the columns doesn't include any key columns, so my guess is it has to do a TABLE SCAN.
The second query with the "SELECT *" includes the primary key, so it can use the index.
The range filter is applied last, so the problem is that the query optimizer is choosing to join the larger table first in one case, and the smaller table first in another. Perhaps someone with more knowledge of the optimizer can tell us why it's joining the tables in a different order for each.
I think the real goal here should be to try to get the JOIN to use an index, so the order of the join wouldn't matter so much.
I would try putting a compisite index on locId,count,avg and see if that doesn't improve speed.
Related
I have two tables with the following structure:
Table counts.
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| bin1 | varchar(20) | NO | MUL | NULL | |
| bin2 | varchar(20) | NO | MUL | NULL | |
| count | float(6,2) | NO | | NULL | |
+-------+-------------+------+-----+---------+----------------+
Table coordinates.
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| bin | varchar(20) | NO | PRI | NULL | |
| chr | varchar(20) | NO | MUL | NULL | |
| start | int(11) | NO | MUL | NULL | |
| end | int(11) | NO | MUL | NULL | |
+-------+-------------+------+-----+---------+-------+
First, I want to get all coordinates.bin that matches my conditions using:
describe select bin from coordinates where chr="chr1" AND (start between 1 AND 1000000000 OR end between 1 AND 1000000000);
+------+-------------+-------------+------+-----------------------------------+-------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------------+------+-----------------------------------+-------+---------+-------+------+--------------------------+
| 1 | SIMPLE | coordinates | ref | chr,chr_2,chr_3,start,start_2,end | chr_2 | 22 | const | 4929 | Using where; Using index |
+------+-------------+-------------+------+-----------------------------------+-------+---------+-------+------+--------------------------+
and it works fine. Then I want, with the coordinates.bin, filter table counts by mapping counts.bin1 and counts.bin2. I tried different queries without success:
describe select * from counts inner join (select bin from coordinates where chr="chr1" AND (start between 1 AND 1000000000 OR end between 1 AND 1000000000)) as subquery on bin=bin1 or bin=bin2;
+------+-------------+-------------+------+-------------------------------------------+-------+---------+-------+----------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------------+------+-------------------------------------------+-------+---------+-------+----------+------------------------------------------------+
| 1 | SIMPLE | coordinates | ref | PRIMARY,chr,chr_2,chr_3,start,start_2,end | chr_2 | 22 | const | 4929 | Using where; Using index |
| 1 | SIMPLE | counts | ALL | bin1,bin2 | NULL | NULL | NULL | 30763816 | Range checked for each record (index map: 0x6) |
+------+-------------+-------------+------+-------------------------------------------+-------+---------+-------+----------+------------------------------------------------+
describe select * from coordinates inner join counts on (counts.bin1=coordinates.bin or counts.bin2=coordinates.bin) where coordinates.chr="chr1" AND (coordinates.start between 1 AND 1000000000 OR coordinates.end between 1 AND 1000000000);
+------+-------------+-------------+------+-------------------------------------------+-------+---------+-------+----------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------------+------+-------------------------------------------+-------+---------+-------+----------+------------------------------------------------+
| 1 | SIMPLE | coordinates | ref | PRIMARY,chr,chr_2,chr_3,start,start_2,end | chr_2 | 22 | const | 4929 | Using where; Using index |
| 1 | SIMPLE | counts | ALL | bin1,bin2 | NULL | NULL | NULL | 30763816 | Range checked for each record (index map: 0x6) |
+------+-------------+-------------+------+-------------------------------------------+-------+---------+-------+----------+------------------------------------------------+
but they are really slow. No key is used.
I have two tables: gpnxuser and key_value
mysql> describe gpnxuser;
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| version | bigint(20) | NO | | NULL | |
| email | varchar(255) | YES | | NULL | |
| uuid | varchar(255) | NO | MUL | NULL | |
| partner_id | bigint(20) | NO | MUL | NULL | |
| password | varchar(255) | YES | | NULL | |
| date_created | datetime | YES | | NULL | |
| last_updated | datetime | YES | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
and
mysql> describe key_value;
+----------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| version | bigint(20) | NO | | NULL | |
| date_created | datetime | YES | | NULL | |
| last_updated | datetime | YES | | NULL | |
| upkey | varchar(255) | NO | MUL | NULL | |
| user_id | bigint(20) | YES | MUL | NULL | |
| security_level | int(11) | NO | | NULL | |
+----------------+--------------+------+-----+---------+----------------+
key_value.user_id is FK that references gpnxuser.id. I also have an index in gpnxuser.partner_id which is a FK that references a table called "partner" (which, I think, does not matter much to this question).
For partner_id = 64, I have 500K rows in gpnxuser which have relationship with approximatelly 6M rows in key_value.
I wanted to have a query that returned all distinct 'key_value.upkey' for userĀ“s belonging to a given partner. I did something like this:
select upkey from gpnxuser join key_value on gpnxuser.id=key_value.user_id where partner_id=64 group by upkey;
which takes forever to run. The explain for the query looks like:
mysql> explain select upkey from gpnxuser join key_value on gpnxuser.id=key_value.user_id where partner_id=64 group by upkey;
+----+-------------+-----------+------+----------------------------+--------------------+---------+-----------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+----------------------------+--------------------+---------+-----------------------------+--------+----------------------------------------------+
| 1 | SIMPLE | gpnxuser | ref | PRIMARY,FKB2D9FEBE725C505E | FKB2D9FEBE725C505E | 8 | const | 259640 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | key_value | ref | FK9E0C0F912D11F5A9 | FK9E0C0F912D11F5A9 | 9 | gpnx_finance_db.gpnxuser.id | 14 | Using where |
+----+-------------+-----------+------+----------------------------+--------------------+---------+-----------------------------+--------+----------------------------------------------+
My question is: is there a query that can run fast and obtain the result that I want?
what you need to do is utilize EXISTS statement: This will cause only partial table scan until a match found and not more.
select upkey from (select distinct upkey from key_value) upk
where EXISTS
(select 1 from gpnxuser u, key_value kv
where u.id=kv.user_id and partner_id=1 and kv.upkey = upk.upkey)
NB. In the original query, group by is misused: distinct looks better there.
select DISTINCT upkey from gpnxuser join key_value on
gpnxuser.id=key_value.user_id where partner_id=1
I would look into partitioning your key_value table on user_id, if you typically run queries based on this column.
http://dev.mysql.com/doc/refman/5.1/en/partitioning.html
my tables are simple:
mysql> desc muralentry ;
+-----------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_src_id | int(11) | NO | MUL | NULL | |
| content | longtext | NO | | NULL | |
+-----------------+------------------+------+-----+---------+----------------+
mysql> desc muralentry_user ;
+-----------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+----------------+
| muralentry_id | int(11) | NO | PRI | NULL | auto_increment |
| userinfo_id | int(11) | NO | MUL | NULL | |
+-----------------+------------------+------+-----+---------+----------------+
Im doing the following query:
SELECT DISTINCT *
FROM muralentry
LEFT OUTER JOIN muralentry_user ON (muralentry.id = muralentry_user.muralentry_id)
WHERE user_src_id = 1
The explain:
+----+-------------+----------------------------+------+-------------------------------------------------------+-------------------------------------+---------+------------------------------------------+------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------------+------+-------------------------------------------------------+-------------------------------------+---------+------------------------------------------+------+-----------------+
| 1 | SIMPLE | muralentry | ref | muralentry_99bd10ae | muralentry_99bd10ae | 4 | const | 686 | Using temporary |
| 1 | SIMPLE | muralentry_user | ref | muralentry_id,muralentry_user_bcd7114e | muralentry_user_bcd7114e | 4 | muralentry.id | 15 | |
+----+-------------+----------------------------+------+-------------------------------------------------------+-------------------------------------+---------+------------------------------------------+------+-----------------+
Good result (for me :D)
But, when i add another where clause:
SELECT DISTINCT *
FROM muralentry
LEFT OUTER JOIN muralentry_user ON (muralentry.id = muralentry_user.muralentry_id)
WHERE user_src_id = 1 OR userinfo_id = 1;
The explain:
+----+-------------+----------------------------+------+-------------------------------------------------------+-------------------------------------+---------+------------------------------------------+---------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------------+------+-------------------------------------------------------+-------------------------------------+---------+------------------------------------------+---------+-----------------+
| 1 | SIMPLE | muralentry | ALL | muralentry_99bd10ae | NULL | NULL | NULL | 1140932 | Using temporary |
| 1 | SIMPLE | muralentry_user | ref | muralentry_id,muralentry_user_bcd7114e | muralentry_user_bcd7114e | 4 | muralentry.id | 15 | Using where |
+----+-------------+----------------------------+------+-------------------------------------------------------+-------------------------------------+---------+------------------------------------------+---------+-----------------+
Wow... the result if ALOT worst...
How can i "fix" this?
Should i create some index to do this job? Or recreate my query?
I'm expecting the following result: 'muralentry' rows where the user is 'user_src_id' AND the 'muralentry_user' rows where he is 'userinfo_id'.
-- edit --
I edited the question because when I wrote an AND actually wanted an OR... sorry for that!
I have a table 'posts' in database which has non-unique index on user_id (Key: MUL).
mysql> show columns from posts;
+---------+--------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | YES | MUL | NULL | |
| post | varchar(140) | NO | | NULL | |
+---------+--------------+------+-----+-------------------+----------------+
For this table, explain gives expected explanation where type is 'REF'
mysql> explain select * from posts where posts.user_id=1;
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | posts | ref | user_id | user_id | 5 | const | 74 | Using where |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
I have a second table 'followers' where 'user_id' and 'follower' are part of non-unique index
mysql> show columns from followers;
+---------------+-----------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-----------+------+-----+---------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | YES | MUL | NULL | |
| follower | int(11) | YES | MUL | NULL | |
+---------------+-----------+------+-----+---------------------+----------------+
But in this table, type is 'ALL'. I expected it to be 'REF' as similar to 'user_id' in previous table, this 'user_id' also has non-unique index. Is there any explanation for this?
mysql> explain select * from followers where followers.user_id=1;
+----+-------------+-----------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | followers | ALL | user_id | NULL | NULL | NULL | 6 | Using where |
+----+-------------+-----------+------+---------------+------+---------+------+------+-------------+
I'll post it as an answer, cause I'm pretty sure this is the case.
I think you get differences because in followers table you have a composite key from both user_id and follower fields, rather than just a key on user_id.
Therefore index will be used for queries that use both user_id AND follower in WHERE clause.
Add a separate index on user_id field and you will get the same explanation.
I have a table of over 9 million rows. I have a SELECT query that I'm using an index for. Here is the query:
SELECT `username`,`id`
FROM `04c1Tg0M`
WHERE `id` > 9259466
AND `tried` = 0
LIMIT 1;
That query executes very fast (0.00 sec). Here is the explain for that query:
+----+-------------+----------+-------+-----------------+---------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+-----------------+---------+---------+------+-------+-------------+
| 1 | SIMPLE | 04c1Tg0M | range | PRIMARY,triedex | PRIMARY | 4 | NULL | 10822 | Using where |
+----+-------------+----------+-------+-----------------+---------+---------+------+-------+-------------+
Now here is the same query except that I'm going to change the id to 6259466:
SELECT `username`,`id`
FROM `04c1Tg0M`
WHERE `id` > 5986551
AND `tried` = 0
LIMIT 1;
That query took 4.78 seconds to complete. This is the problem. Here is the explain for that query:
+----+-------------+----------+------+-----------------+---------+---------+-------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+-----------------+---------+---------+-------+---------+-------------+
| 1 | SIMPLE | 04c1Tg0M | ref | PRIMARY,triedex | triedex | 2 | const | 9275107 | Using where |
+----+-------------+----------+------+-----------------+---------+---------+-------+---------+-------------+
What is happening here and how can I fix it? Here are my indexes:
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| 04c1Tg0M | 0 | PRIMARY | 1 | id | A | 9275093 | NULL | NULL | | BTREE | |
| 04c1Tg0M | 1 | pdex | 1 | username | A | 9275093 | NULL | NULL | | BTREE | |
| 04c1Tg0M | 1 | pdex | 2 | id | A | 9275093 | NULL | NULL | | BTREE | |
| 04c1Tg0M | 1 | pdex | 3 | tried | A | 9275093 | NULL | NULL | YES | BTREE | |
| 04c1Tg0M | 1 | triedex | 1 | tried | A | 0 | NULL | NULL | YES | BTREE | |
| 04c1Tg0M | 1 | triedex | 2 | id | A | 9275093 | NULL | NULL | | BTREE | |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
And here is my table structure:
| 04c1Tg0M | CREATE TABLE `04c1Tg0M` (
`id` int(20) NOT NULL AUTO_INCREMENT,
`username` varchar(50) NOT NULL,
`tried` tinyint(1) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `pdex` (`username`,`id`,`tried`),
KEY `triedex` (`tried`,`id`)
) ENGINE=MyISAM AUTO_INCREMENT=9275108 DEFAULT CHARSET=utf8 |
The first SQL returns 10822 rows, while the second one returns 9275107 rows!
The use of primary key "id" index in the second query isn't so useful because you have to do a full table scan anyway.
MySQL's cost-based optimizer thinks, in the case of the 2nd query, it's better off to use the index on 'tried'.
If you have to do a full table-scan, you're better off not using an index, as index constitutes additional disk reads.
You can use "use index" or "force index" in your query to hint to the optimizer whether to use an index.
Also update the statistics by analyzing your table periodically so the cost-based optimizer is working correctly.