How to index to avoid full table scan? - mysql

How can I index the following query to avoid the full table scan?
explain SELECT fld1, fld2 FROM tablename WHERE IdReceived > 0;
+----+-------------+------------------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | tablename | ALL |IdReceived _idx| NULL | NULL | NULL | 99617 | Using where |
+----+-------------+------------------+------+---------------+------+---------+------+-------+-------------+
I have modified the query as bellow then also I can see row id2 (UNION) is going for full table scan.
explain SELECT fld1,fld2 FROM tablename WHERE IdReceived=1 UNION SELECT fld1,fld2 FROM tablename WHERE IdReceived>=1;
+----+--------------+------------------+------+---------------+--------------+---------+-------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------------+------+---------------+--------------+---------+-------+-------+-------------+
| 1 | PRIMARY | tablename | ref | IdReceived _idx | IdReceived _idx | 4 | const | 8865 | |
| 2 | UNION | tablename | ALL | IdReceived _idx | NULL | NULL | NULL | 99617 | Using where |
| NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+------------------+------+---------------+--------------+---------+-------+-------+-------------+

Since you are comparing the indexed column with the constant value,try to avoid that.
Refer here: http://dev.mysql.com/doc/refman/5.0/en/where-optimizations.html
Also I suggest a non_clustered index on fld1,fld2 to make this query perform faster

Related

MySQL doesn't use index as expected

EXPLAIN on this query
select v.type,sum(c.rank)
from
(select distinct power,color,type from vehicle) v
join configuration c using (power,color)
group by v.type
gives
+----+-------------+---------------+------------+-------+---------------+-------------+---------+-----------------------------------------+---------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+-------+---------------+-------------+---------+-----------------------------------------+---------+----------+---------------------------------+
| 1 | PRIMARY | configuration | NULL | ALL | veh | NULL | NULL | NULL | 76658 | 100.00 | Using temporary; Using filesort |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 6 | configuration.power,configuration.color | 65 | 100.00 | NULL |
| 2 | DERIVED | vehicle | NULL | index | cov | cov | 20 | NULL | 5058658 | 100.00 | Using index |
+----+-------------+---------------+------------+-------+---------------+-------------+---------+-----------------------------------------+---------+----------+---------------------------------+
The index on configuration (power,color) is not used even if I set force index
If I use a table instead of a subquery
create table tmp select distinct power,color,type from vehicle
then Explain on the 'same' query
select v.type,sum(c.rank)
from
tmp v
join configuration c using (power,color)
group by type
becomes
+----+-------------+---------------+------------+------+---------------+------+---------+---------------------+---------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+------+---------------+------+---------+---------------------+---------+----------+---------------------------------+
| 1 | SIMPLE | tmp | NULL | ALL | NULL | NULL | NULL | NULL | 1016144 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | configuration | NULL | ref | veh | veh | 6 | tmp.power,tmp.color | 2 | 100.00 | NULL |
+----+-------------+---------------+------------+------+---------------+------+---------+---------------------+---------+----------+---------------------------------+
and this is 4 times faster
How can I avoid using a hard table ?
In the first case the optimizer thinks it is better to do it the other way around, by using the auto generated key in the derived table.
In the second case there is no key in the temp table, so the best plan is to go for tmp first.
You should be able to force the table order by using STRAIGHT_JOIN instead of JOIN.

Multi-Column IN and INDEX in MySQL

I have a table like this in MySQL. (version is 5.5)
+------------------+---------------+------+-----+---------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+---------------+------+-----+---------------------+-----------------------------+
| A | varchar(50) | NO | PRI | NULL | |
| B | varchar(50) | NO | PRI | NULL | |
:
But index for WHERE (multi-column) IN does not work.
If number of the set after 'IN' is only one, index works.
explain SELECT * FROM table WHERE (A, B) IN(('1', '2')) ;
+----+-------------+-------------------+-------+---------------+---------+---------+-------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+-------+---------------+---------+---------+-------------+------+-------+
| 1 | SIMPLE | table | const | PRIMARY | PRIMARY | 304 | const,const | 1 | |
+----+-------------+-------------------+-------+---------------+---------+---------+-------------+------+-------+
But number of the set after 'IN' is more than two, index does not work.
explain SELECT * FROM table WHERE (A, B) IN(('1', '2'), ('3', '4'));
+----+-------------+-------------------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | table | ALL | NULL | NULL | NULL | NULL | 857897 | Using where |
+----+-------------+-------------------+------+---------------+------+---------+------+--------+-------------+
Why index does not work?
[UPDATE]
If I changed query like below, Index works.
explain SELECT * FROM tabe WHERE (A='1' AND B='2') or (A='3' AND B='4');
+----+-------------+-------------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | table | range | PRIMARY | PRIMARY | 304 | NULL | 2 | Using where |
+----+-------------+-------------------+-------+---------------+---------+---------+------+------+-------------+
But I'd rather like to use IN clause because AND/OR query is very long.

Speed up slow mysql query

I am trying to improve performance for an application. I might need to create summary tables that run on cron so the app doesn't take as long to load (5-10 seconds). Is that the best idea?
Given the following table:
mysql> describe school_data_sets_numeric_data;
+--------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| data_set_nid | int(11) | NO | MUL | NULL | |
| school_nid | int(11) | NO | MUL | NULL | |
| year | int(11) | NO | MUL | NULL | |
| description | varchar(255) | NO | | NULL | |
| value | decimal(18,5) | NO | | NULL | |
+--------------+---------------+------+-----+---------+----------------+
6 rows in set (0.00 sec)
And the following queries (run once for each data_set_nid for a school)
This query runs fast (0 seconds):
SELECT year, description, CONCAT(FORMAT((value/(SELECT SUM(value)
FROM `school_data_sets_numeric_data` as numeric_data_inner
WHERE year = numeric_data_outer.year and data_set_nid = numeric_data_outer.data_set_nid and school_nid = numeric_data_outer.school_nid)) * 100, 2), '%') as value
FROM `school_data_sets_numeric_data` as numeric_data_outer
WHERE data_set_nid = 38251 and school_nid = 32805 ORDER BY id DESC;
Explain:
+----+--------------------+--------------------+------+---------------------------------------------+--------------+---------+-----------------------------------------------------------------------------------------------------------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------------+------+---------------------------------------------+--------------+---------+-----------------------------------------------------------------------------------------------------------+------+-----------------------------+
| 1 | PRIMARY | numeric_data_outer | ref | data_set_nid,data_set_nid_2,school_nid | data_set_nid | 8 | const,const | 17 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | numeric_data_inner | ref | year,data_set_nid,data_set_nid_2,school_nid | data_set_nid | 8 | rocdocs_main_drupal_7.numeric_data_outer.data_set_nid,rocdocs_main_drupal_7.numeric_data_outer.school_nid | 9 | Using where |
+----+--------------------+--------------------+------+---------------------------------------------+--------------+---------+-----------------------------------------------------------------------------------------------------------+------+-----------------------------+
This query runs slow (1.43 seconds):
SELECT year, description, CONCAT(FORMAT((SUM(value)/(SELECT SUM(value)
FROM `school_data_sets_numeric_data` as numeric_data_inner
WHERE year = numeric_data_outer.year and data_set_nid = numeric_data_outer.data_set_nid)) * 100, 2), '%') as value
FROM `school_data_sets_numeric_data` as numeric_data_outer
WHERE data_set_nid = 38251 GROUP BY year,description ORDER BY id DESC;
Explain:
+----+--------------------+--------------------+------+----------------------------------+----------------+---------+-------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------------+------+----------------------------------+----------------+---------+-------+-------+----------------------------------------------+
| 1 | PRIMARY | numeric_data_outer | ref | data_set_nid,data_set_nid_2 | data_set_nid_2 | 4 | const | 90640 | Using where; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | numeric_data_inner | ref | year,data_set_nid,data_set_nid_2 | year | 4 | func | 38871 | Using where |
+----+--------------------+--------------------+------+----------------------------------+----------------+---------+-------+-------+----------------------------------------------+
Correlated subqueries/subselects are often a bottelneck - partly due to the fact that MySql only has a nested loop join algorithm and no hash-joins/merge-joins.
I would try joining your main select to a derived table holding all the SUM values you need.

MySQL select specific cols slower than select *

My MySQL is not strong, so please forgive any rookie mistakes. Short version:
SELECT locId,count,avg FROM destAgg_geo is significantly slower than SELECT * from destAgg_geo
prtt.destAgg is a table keyed on dst_ip (PRIMARY)
mysql> describe prtt.destAgg;
+---------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+-------+
| dst_ip | int(10) unsigned | NO | PRI | 0 | |
| total | float unsigned | YES | | NULL | |
| avg | float unsigned | YES | | NULL | |
| sqtotal | float unsigned | YES | | NULL | |
| sqavg | float unsigned | YES | | NULL | |
| count | int(10) unsigned | YES | | NULL | |
+---------+------------------+------+-----+---------+-------+
geoip.blocks is a table keyed on both startIpNum and endIpNum (PRIMARY)
mysql> describe geoip.blocks;
+------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+-------+
| startIpNum | int(10) unsigned | NO | MUL | NULL | |
| endIpNum | int(10) unsigned | NO | | NULL | |
| locId | int(10) unsigned | NO | | NULL | |
+------------+------------------+------+-----+---------+-------+
destAgg_geo is a view:
CREATE VIEW destAgg_geo AS SELECT * FROM destAgg JOIN geoip.blocks
ON destAgg.dst_ip BETWEEN geoip.blocks.startIpNum AND geoip.blocks.endIpNum;
Here's the optimization plan for select *:
mysql> explain select * from destAgg_geo;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | blocks | ALL | start_end | NULL | NULL | NULL | 3486646 | |
| 1 | SIMPLE | destAgg | ALL | PRIMARY | NULL | NULL | NULL | 101893 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
Here's the optimization plan for select with specific columns:
mysql> explain select locId,count,avg from destAgg_geo;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | destAgg | ALL | PRIMARY | NULL | NULL | NULL | 101893 | |
| 1 | SIMPLE | blocks | ALL | start_end | NULL | NULL | NULL | 3486646 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
Here's the optimization plan for every column from destAgg and just the locId column from geoip.blocks:
mysql> explain select dst_ip,total,avg,sqtotal,sqavg,count,locId from destAgg_geo;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | blocks | ALL | start_end | NULL | NULL | NULL | 3486646 | |
| 1 | SIMPLE | destAgg | ALL | PRIMARY | NULL | NULL | NULL | 101893 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
Remove any column except dst_ip and the range check flips to blocks:
mysql> explain select dst_ip,avg,sqtotal,sqavg,count,locId from destAgg_geo;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | destAgg | ALL | PRIMARY | NULL | NULL | NULL | 101893 | |
| 1 | SIMPLE | blocks | ALL | start_end | NULL | NULL | NULL | 3486646 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
which is then much slower. What's going on here?
(Yes, I could just use the * query results and process from there, but I would like to know what's happening and why)
EDIT -- EXPLAIN on the VIEW query:
mysql> explain SELECT * FROM destAgg JOIN geoip.blocks ON destAgg.dst_ip BETWEEN geoip.blocks.startIpNum AND geoip.blocks.endIpNum;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | blocks | ALL | start_end | NULL | NULL | NULL | 3486646 | |
| 1 | SIMPLE | destAgg | ALL | PRIMARY | NULL | NULL | NULL | 101893 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
MySQL can tell you if you run EXPLAIN PLAN on both queries.
The first query with the columns doesn't include any key columns, so my guess is it has to do a TABLE SCAN.
The second query with the "SELECT *" includes the primary key, so it can use the index.
The range filter is applied last, so the problem is that the query optimizer is choosing to join the larger table first in one case, and the smaller table first in another. Perhaps someone with more knowledge of the optimizer can tell us why it's joining the tables in a different order for each.
I think the real goal here should be to try to get the JOIN to use an index, so the order of the join wouldn't matter so much.
I would try putting a compisite index on locId,count,avg and see if that doesn't improve speed.

MySQL query not taking advantage of index

I was analizing a query (working on a wordpress plugin named nextgen gallery), this is what I got
query:
EXPLAIN
SELECT title, filename
FROM wp_ngg_pictures wnp
LEFT JOIN wp_ngg_gallery wng
ON wng.gid = wnp.galleryid
GROUP BY wnp.galleryid
LIMIT 5
result:
+----+-------------+-------+--------+---------------+---------+---------+-----------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------+------+---------------------------------+
| 1 | SIMPLE | wnp | ALL | NULL | NULL | NULL | NULL | 439 | Using temporary; Using filesort |
| 1 | SIMPLE | wng | eq_ref | PRIMARY | PRIMARY | 8 | web1db1.wnp.galleryid | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------+------+---------------------------------+
so I do:
ALTER TABLE wp_ngg_pictures ADD INDEX(galleryid);
and on my local test system I get:
+----+-------------+-------+--------+---------------+-----------+---------+--------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+-----------+---------+--------------------+------+-------+
| 1 | SIMPLE | wnp | index | galleryid | galleryid | 8 | NULL | 30 | |
| 1 | SIMPLE | wng | eq_ref | PRIMARY | PRIMARY | 8 | test.wnp.galleryid | 1 | |
+----+-------------+-------+--------+---------------+-----------+---------+--------------------+------+-------+
which seems fine, but on the final server I get
+----+-------------+-------+--------+---------------+-----------+---------+-----------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+-----------+---------+-----------------------+------+-------+
| 1 | SIMPLE | wnp | index | galleryid | galleryid | 8 | NULL | 439 | |
| 1 | SIMPLE | wng | eq_ref | PRIMARY | PRIMARY | 8 | web1db1.wnp.galleryid | 1 | |
+----+-------------+-------+--------+---------------+-----------+---------+-----------------------+------+-------+
so the index is used but all the rows are scanned anyway? Why is this happening?
Only difference I can see is mysql version which is 5.1.47 (local) vs 5.0.45 (remote), data is the same on both systems.
The rows column in the EXPLAIN SELECT output is an estimate of the number of rows that MySQL believes it must examine to execute the query, so I guess it is possible that your local version (5.1.47) is better at estimating than your remote version.
Without the EXPLAIN clause, do both queries produce the same output? What happens if you change the query to use a STRAIGHT_JOIN?