MySQL 8.0 Index condition pushdown (ICP) does not take effect - mysql

Recently I have considered how Index condition Pushdown work, and whether make a difference of locking in Repeatable read isolation level.
However, I find ICP does not take effect in MySQL 8.0, but work in 5.7. The repro steps as below shown ( based on guide in Index Condition Pushdown Optimization
Prepare table and data
CREATE TABLE `fruit` (
`id` bigint NOT NULL AUTO_INCREMENT,
`name` varchar(32) NOT NULL,
`age` int DEFAULT NULL,
`data` varchar(16) DEFAULT '',
PRIMARY KEY (`id`),
KEY `i_age_name` (`age`,`name`)
) ENGINE=InnoDB;
-- Prepare data
delimiter ;;
CREATE PROCEDURE `load_data`()
BEGIN
DECLARE i int DEFAULT 1;
WHILE i < 300 DO
INSERT INTO fruit (`name`, `age`, `data`) VALUES
(substring(MD5(RAND()),1,20), i / 10 + 1, 'test data'),
(substring(MD5(RAND()),1,20), i / 10 + 1, 'hello world');
SET i = i + 1;
END WHILE;
END ;;
delimiter ;
call load_data();
execute a query
explain select * from fruit where age = 10 and name like '%a';
We hope it Extra column could show Using index condition, but not found, I do not know why?
+----+-------------+-------+------------+------+---------------+------------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | fruit | NULL | ref | i_age_name | i_age_name | 5 | const | 20 | 11.11 | Using where |
+----+-------------+-------+------------+------+---------------+------------+---------+-------+------+----------+-------------+
In Mysql 5.7, the query execution plan shows ICP is used
+----+-------------+-------+------------+------+---------------+------------+---------+-------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------------+---------+-------+------+----------+-----------------------+
| 1 | SIMPLE | fruit | NULL | ref | i_age_name | i_age_name | 5 | const | 20 | 11.11 | Using index condition |
+----+-------------+-------+------------+------+---------------+------------+---------+-------+------+----------+-----------------------+

Related

slower query for searching nearby coordinates

I seem to hit slower query result for searching nearby coordinates ( for now the query is for latitude). This is a mysql query
select ABS(propertyCoordinatesLat - 3.33234) as diff from tablename order by diff asc limit 0,20
is there a way to improve this besides relying on server scripting to do the sorting?
table dump.
CREATE TABLE `property` (
`propertyID` bigint(20) NOT NULL,
`propertyName` varchar(100) NOT NULL,
`propertyCoordinatesLat` varchar(100) NOT NULL,
`propertyCoordinatesLng` varchar(100) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
--
-- Indexes for dumped tables
--
--
-- Indexes for table `property`
--
ALTER TABLE `property`
ADD PRIMARY KEY (`propertyID`),
ADD KEY `propertyCoordinatesLat` (`propertyCoordinatesLat`,`propertyCoordinatesLng`),
ADD KEY `propertyCoordinatesLat_2` (`propertyCoordinatesLat`),
ADD KEY `propertyCoordinatesLng` (`propertyCoordinatesLng`);
--
-- AUTO_INCREMENT for dumped tables
--
--
-- AUTO_INCREMENT for table `property`
--
ALTER TABLE `property`
MODIFY `propertyID` bigint(20) NOT NULL AUTO_INCREMENT;
COMMIT;
The query is ordering by the difference between a string and a float. This odd calculation confuses and angers MySQL and results in a slow filesort.
mysql> explain select ABS(propertyCoordinatesLat - 3.33234) as diff from property order by diff
+----+-------------+----------+------------+-------+---------------+--------------------------+---------+------+------+----------+-----------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+---------------+--------------------------+---------+------+------+----------+-----------------------------+
| 1 | SIMPLE | property | NULL | index | NULL | propertyCoordinatesLat_2 | 302 | NULL | 1 | 100.00 | Using index; Using filesort |
+----+-------------+----------+------------+-------+---------------+--------------------------+---------+------+------+----------+-----------------------------+
Changing propertyCoordinatesLat and propertyCoordinatesLng to a more sensible numeric type lets MySQL optimize better. No more filesort. This should perform much better.
alter table property change propertyCoordinatesLat propertyCoordinatesLat numeric(10,8) not null;
alter table property change propertyCoordinatesLng propertyCoordinatesLng numeric(11,8) not null;
mysql> explain select ABS(propertyCoordinatesLat - 3.33234) as diff from property order by propertyCoordinatesLat asc limit 0,20;
+----+-------------+----------+------------+-------+---------------+--------------------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+---------------+--------------------------+---------+------+------+----------+-------------+
| 1 | SIMPLE | property | NULL | index | NULL | propertyCoordinatesLat_2 | 5 | NULL | 1 | 100.00 | Using index |
+----+-------------+----------+------------+-------+---------------+--------------------------+---------+------+------+----------+-------------+
If you want to get fancy, look into MySQL's spatial types. These will probably perform better, and definitely be more accurate.

MySQL not recognising datetime index in WHERE

What I'm trying to do is index the first name of a person and the date they were born.
The table is laid out like this:
CREATE TABLE test
(
id INT NOT NULL AUTO_INCREMENT,
fname VARCHAR(10) NOT NULL,
sname VARCHAR(10) NOT NULL,
age INT NOT NULL,
born DATETIME NOT NULL,
PRIMARY KEY(id),
INDEX name_age(fname, sname, age),
INDEX name_date(fname, born)
)
However the index isn't recognised in a where statement like so:
mysql> EXPLAIN SELECT *
-> FROM `test`
-> WHERE fname = "coby"
-> AND born
-> BETWEEN "1900-05-02 06:23:00"
-> AND "2100-05-02 06:23:00";
+----+-------------+-------+------+--------------------+----------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+--------------------+----------+---------+-------+------+-------------+
| 1 | SIMPLE | test | ref | name_age,name_date | name_age | 12 | const | 45 | Using where |
+----+-------------+-------+------+--------------------+----------+---------+-------+------+-------------+
1 row in set (0.00 sec)
However it is recognised in an order by statement:
mysql> EXPLAIN SELECT *
-> FROM `test`
-> WHERE fname = "coby"
-> ORDER BY born;
+----+-------------+-------+------+--------------------+-----------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+--------------------+-----------+---------+-------+------+-------------+
| 1 | SIMPLE | test | ref | name_age,name_date | name_date | 12 | const | 45 | Using where |
+----+-------------+-------+------+--------------------+-----------+---------+-------+------+-------------+
1 row in set (0.00 sec)
How do I make it so that the index is recognised in a where statement?
Any help would be appreciated.
The index is recognized, as seen in the column possible_keys of the result of the first EXPLAIN statement. It just happens, that for 45 rows the other index produces an equal or better query plan: The selectivity of your date range is close to zero.
The ORDER BY is another pair of shoes: As you use the index not only for selection, but also for ordering, it now becomes useful.

MySQL : Why is comparing primary key to a randomly generated number not using the index?

Trying to select a random row from a table, based on autoincremented primary key with no holes.
The table schema :
CREATE TABLE IF NOT EXISTS `testTable` (
`id` int(9) NOT NULL AUTO_INCREMENT,
`data` varchar(100) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=0 ;
INSERT INTO `testTable` (`id`, `data`) VALUES
(1, 'hello'),
(2, 'world'),
(3, 'new'),
(4, 'data'),
(5, 'more and more'),
(6, 'data '),
(7, 'more rows here'),
(8, 'most rows here'),
(9, 'testing'),
(10,'last');
Queries:
1/ explain select * from testTable where id = ceil(Rand()*10) limit 1 ;
http://sqlfiddle.com/#!2/6e2b1/1
Result :
| ID | SELECT_TYPE | TABLE | TYPE | POSSIBLE_KEYS | KEY | KEY_LEN | REF | ROWS | EXTRA |
--------------------------------------------------------------------------------------------------------
| 1 | SIMPLE | testTable | ALL | (null) | (null) | (null) | (null) | 10 | Using where |
2/ explain select * from testTable where id = 7 limit 1 ;
http://sqlfiddle.com/#!2/6e2b1/2
Result:
| ID | SELECT_TYPE | TABLE | TYPE | POSSIBLE_KEYS | KEY | KEY_LEN | REF | ROWS | EXTRA |
---------------------------------------------------------------------------------------------------
| 1 | SIMPLE | testTable | const | PRIMARY | PRIMARY | 4 | const | 1 | |
Why is query#1 not using the index, when ceil(rand()*10) should ideally evaluate to a constant which can then be compared to the primary key ? Shouldn't the optimizer work that way ? Or am I missing something obvious here.
The key can't be used with that query because RAND() is called for each row and returns a different value each time.
You may try this code instead:
SET #rand_value := CEIL(RAND()*10);
EXPLAIN SELECT * FROM testTable WHERE id = #rand_value;
It first computes a random value and assigns it to a variable, then uses it in the query.
As pointed out by aneroid, the LIMIT 1 is useless: since the condition applies to the primary key, the query will never return more than one row.
With this query, the output is:
| ID | SELECT_TYPE | TABLE | TYPE | POSSIBLE_KEYS | KEY | KEY_LEN | REF | ROWS | EXTRA |
---------------------------------------------------------------------------------------------------
| 1 | SIMPLE | testTable | const | PRIMARY | PRIMARY | 4 | const | 1 | |
From the MySQL documentation for RAND():
RAND() in a WHERE clause is re-evaluated every time the WHERE is executed.
So it's not comparing the Primary Key with a constant, it's a changing value every time (in this case, for each row). If you remove the LIMIT 1 in your query, you will see more rows coming up with the different PKs matched -- which shows the "re-evaluated every time" behaviour.
Edit: See Jocelyn's example as a way to generate the random number first and then get a row with the matching PK id (the LIMIT 1 is not required, btw). Similarly stated in Najzero's comment.

Why is mySQL query, left join 'considerably' faster than my inner join

I've researched this, but I still cannot explain why:
SELECT cl.`cl_boolean`, l.`l_name`
FROM `card_legality` cl
INNER JOIN `legality` l ON l.`legality_id` = cl.`legality_id`
WHERE cl.`card_id` = 23155
Is significantly slower than:
SELECT cl.`cl_boolean`, l.`l_name`
FROM `card_legality` cl
LEFT JOIN `legality` l ON l.`legality_id` = cl.`legality_id`
WHERE cl.`card_id` = 23155
115ms Vs 478ms. They are both using InnoDB and there are relationships defined. The 'card_legality' contains approx 200k rows, while the 'legality' table contains 11 rows. Here is the structure for each:
CREATE TABLE `card_legality` (
`card_id` varchar(8) NOT NULL DEFAULT '',
`legality_id` int(3) NOT NULL,
`cl_boolean` tinyint(1) NOT NULL,
PRIMARY KEY (`card_id`,`legality_id`),
KEY `legality_id` (`legality_id`),
CONSTRAINT `card_legality_ibfk_2` FOREIGN KEY (`legality_id`) REFERENCES `legality` (`legality_id`),
CONSTRAINT `card_legality_ibfk_1` FOREIGN KEY (`card_id`) REFERENCES `card` (`card_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
And:
CREATE TABLE `legality` (
`legality_id` int(3) NOT NULL AUTO_INCREMENT,
`l_name` varchar(16) NOT NULL DEFAULT '',
PRIMARY KEY (`legality_id`)
) ENGINE=InnoDB AUTO_INCREMENT=12 DEFAULT CHARSET=latin1;
I could simply use LEFT-JOIN, but it doesn't seem quite right... any thoughts, please?
UPDATE:
As requested, I've included the results of explain for each. I had run it previously, but I dont pretend to have a thorough understanding of it..
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE cl ALL PRIMARY NULL NULL NULL 199747 Using where
1 SIMPLE l eq_ref PRIMARY PRIMARY 4 hexproof.co.uk.cl.legality_id 1
AND, inner join:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE l ALL PRIMARY NULL NULL NULL 11
1 SIMPLE cl ref PRIMARY,legality_id legality_id 4 hexproof.co.uk.l.legality_id 33799 Using where
It is because of the varchar on card_id. MySQL can't use the index on card_id as card_id as described here mysql type conversion. The important part is
For comparisons of a string column with a number, MySQL cannot use an
index on the column to look up the value quickly. If str_col is an
indexed string column, the index cannot be used when performing the
lookup in the following statement:
SELECT * FROM tbl_name WHERE str_col=1;
The reason for this is that there are many different strings that may
convert to the value 1, such as '1', ' 1', or '1a'.
If you change your queries to
SELECT cl.`cl_boolean`, l.`l_name`
FROM `card_legality` cl
INNER JOIN `legality` l ON l.`legality_id` = cl.`legality_id`
WHERE cl.`card_id` = '23155'
and
SELECT cl.`cl_boolean`, l.`l_name`
FROM `card_legality` cl
LEFT JOIN `legality` l ON l.`legality_id` = cl.`legality_id`
WHERE cl.`card_id` = '23155'
You should see a huge improvement in speed and also see a different EXPLAIN.
Here is a similar (but easier) test to show this:
> desc id_test;
+-------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+-------+
| id | varchar(8) | NO | PRI | NULL | |
+-------+------------+------+-----+---------+-------+
1 row in set (0.17 sec)
> select * from id_test;
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+----+
9 rows in set (0.00 sec)
> explain select * from id_test where id = 1;
+----+-------------+---------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | id_test | index | PRIMARY | PRIMARY | 10 | NULL | 9 | Using where; Using index |
+----+-------------+---------+-------+---------------+---------+---------+------+------+--------------------------+
1 row in set (0.00 sec)
> explain select * from id_test where id = '1';
+----+-------------+---------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | id_test | const | PRIMARY | PRIMARY | 10 | const | 1 | Using index |
+----+-------------+---------+-------+---------------+---------+---------+-------+------+-------------+
1 row in set (0.00 sec)
In the first case there is Using where; Using index and the second is Using index. Also ref is either NULL or CONST. Needless to say, the second one is better.
L2G has it pretty much summed up, although I suspect it could be because of the varchar type used for card_id.
I actually printed out this informative page for benchmarking / profiling quickies. Here is a quick poor-mans profiling technique:
Time a SQL on MySQL
Enable Profiling
mysql> SET PROFILING = 1
...
RUN your SQLs
...
mysql> SHOW PROFILES;
+----------+------------+-----------------------+
| Query_ID | Duration | Query |
+----------+------------+-----------------------+
| 1 | 0.00014600 | SELECT DATABASE() |
| 2 | 0.00024250 | select user from user |
+----------+------------+-----------------------+
mysql> SHOW PROFILE for QUERY 2;
+--------------------------------+----------+
| Status | Duration |
+--------------------------------+----------+
| starting | 0.000034 |
| checking query cache for query | 0.000033 |
| checking permissions | 0.000006 |
| Opening tables | 0.000011 |
| init | 0.000013 |
| optimizing | 0.000004 |
| executing | 0.000011 |
| end | 0.000004 |
| query end | 0.000002 |
| freeing items | 0.000026 |
| logging slow query | 0.000002 |
| cleaning up | 0.000003 |
+--------------------------------+----------+
Good-luck, oh and please post your findings!
I'd try EXPLAIN on both of those queries. Just prefix each SELECT with EXPLAIN and run them. It gives really useful info on how mySQL is optimizing and executing queries.
I'm pretty sure that MySql has better optimization for Left Joins - no evidence to back this up at the moment.
ETA : A quick scout round and I can't find anything concrete to uphold my view so.....

Query against two integer columns take an absurd amount of time

I have a query that gets generated (by Django) like this:
SELECT `geo_ip`.`id`, `geo_ip`.`start_ip`,
`geo_ip`.`end_ip`, `geo_ip`.`start`,
`geo_ip`.`end`, `geo_ip`.`cc`, `geo_ip`.`cn`
FROM `geo_ip`
WHERE (`geo_ip`.`start` <= 2084738290 AND `geo_ip`.`end` >= 2084738290 )
LIMIT 1
It queries a GeoLocating table with 134189 entries in it. Each query takes >100ms to perform when indexes are added, which makes it unusable for more than one-off things. I'm going to cache the response so I only have to do the IP lookup once, but I'm curious if I'm missing some obvious way of making it a magnitude faster. My table:
CREATE TABLE `geo_ip` (
`start_ip` char(15) NOT NULL,
`end_ip` char(15) NOT NULL,
`start` bigint(20) NOT NULL,
`end` bigint(20) NOT NULL,
`cc` varchar(6) NOT NULL,
`cn` varchar(150) NOT NULL,
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=134190 DEFAULT CHARSET=latin1
Creating an index on both columns like so:
ALTER TABLE geo_ip ADD INDEX (start, end);
Gives the following explain:
EXPLAIN SELECT geo_ip.id, geo_ip.start_ip, geo_ip.end_ip,
geo_ip.start, geo_ip.end, geo_ip.cc, geo_ip.cn
FROM geo_ip
WHERE (geo_ip.end >= 2084738290 AND geo_ip.start < 2084738290)
LIMIT 1;
+----+-------------+--------+-------+---------------+-------+---------+------+-------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+-------+---------------+-------+---------+------+-------+----------+-------------+
| 1 | SIMPLE | geo_ip | range | start | start | 8 | NULL | 67005 | 100.00 | Using where |
+----+-------------+--------+-------+---------------+-------+---------+------+-------+----------+-------------+
It takes well over 100ms to complete selects:
SELECT geo_ip.id, geo_ip.start_ip, geo_ip.end_ip,
geo_ip.start, geo_ip.end, geo_ip.cc,
geo_ip.cn
FROM geo_ip
WHERE (geo_ip.end >= 2084738290 and geo_ip.start < 2084738290)
LIMIT 1;
+-------+--------------+----------------+------------+------------+----+-----------+
| id | start_ip | end_ip | start | end | cc | cn |
+-------+--------------+----------------+------------+------------+----+-----------+
| 51725 | 124.66.128.0 | 124.66.159.255 | 2084732928 | 2084741119 | SG | Singapore |
+-------+--------------+----------------+------------+------------+----+-----------+
1 row in set (0.18 sec)
Is more expensive than having a single individual index:
ALTER TABLE geo_ip ADD INDEX (`start`);
ALTER TABLE geo_ip ADD INDEX (`end`);
+----+-------------+--------+-------+---------------+-------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+-------+---------+------+-------+-------------+
| 1 | SIMPLE | geo_ip | range | start,end | start | 8 | NULL | 68017 | Using where |
+----+-------------+--------+-------+---------------+-------+---------+------+-------+-------------+
It takes around 100ms to complete these requests:
SELECT geo_ip.id, geo_ip.start_ip, geo_ip.end_ip, geo_ip.start, geo_ip.end, geo_ip.cc, geo_ip.cn FROM geo_ip
WHERE (geo_ip.end >= 2084738290 AND geo_ip.start < 2084738290) limit 1;
+-------+--------------+----------------+------------+------------+----+-----------+
| id | start_ip | end_ip | start | end | cc | cn |
+-------+--------------+----------------+------------+------------+----+-----------+
| 51725 | 124.66.128.0 | 124.66.159.255 | 2084732928 | 2084741119 | SG | Singapore |
+-------+--------------+----------------+------------+------------+----+-----------+
1 row in set (0.11 sec)
But both of these methods take way too long, is it possible to do anything about this?
Time is always consumed in the "where" clause.
And because you are working on two different fields with "lower than" or "greater than", it has to read a lot of indexes to find out which record is the one you want.
I should have done my table this way :
+-------+-------+----------------+------------+----+-----------+
| id | type | ip | geo | cc | cn |
+-------+-------+----------------+------------+----+-----------+
| 51725 | start | 124.66.159.255 | 2084732928 | SG | Singapore |
+-------+-------+----------------+------------+----+-----------+
| 51726 | end | 124.66.159.255 | 2084732928 | SG | Singapore |
+-------+-------+----------------+------------+----+-----------+
so that I can select this :
select * from table where geo between '2084732927' and '2084732928'
with an index on geo.
Should be much, much faster. But sorry, I have no time to try.