I have the following row in a mysql table
+--------------------------------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------------------+----------------------+------+-----+---------+----------------+
| created_at | timestamp | YES | MUL | NULL | |
The following index exists on the field
*************************** 6. row ***************************
Table: My_Table
Non_unique: 1
Key_name: IDX_My_Table_CREATED_AT
Seq_in_index: 1
Column_name: created_at
Collation: A
Cardinality: 273809
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
I am trying to optimize the following query to use the IDX_My_Table_CREATED_AT index for the range condition
SELECT * FROM My_Table as main_table WHERE ((main_table.created_at >= '2013-07-01 05:00:00') AND (main_table.created_at <= '2013-11-09 05:59:59'))\G
When I use EXPLAIN on the select query, I get the following:
+----+-------------+------------+------+---------------------------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------------------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | main_table | ALL | IDX_My_Table_CREATED_AT | NULL | NULL | NULL | 273809 | Using where |
+----+-------------+------------+------+---------------------------------+------+---------+------+--------+-------------+
The issue is that the IDX_My_Table_CREATED_AT index is not being used for this range condition, even though it is a BTREE index and therefore should be applicable to the query.
Strangely, if I attempt a single value lookup on the column, the index is used.
EXPLAIN SELECT * FROM My_Table as main_table WHERE (main_table.created_at = '2013-07-01 05:00:00');
+----+-------------+------------+------+---------------------------------+---------------------------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------------------------+---------------------------------+---------+-------+------+-------------+
| 1 | SIMPLE | main_table | ref | IDX_My_Table_CREATED_AT index | IDX_My_Table_CREATED_AT index | 5 | const | 1 | Using where |
+----+-------------+------------+------+---------------------------------+---------------------------------+---------+-------+------+-------------+
Why isn't the index being used for the range condition? I have tried changing the query to use BETWEEN but that didn't change anything.
The answer is simple..
The MySQL optimizer is cost based and the optimizer calculated that an full table scan was the best way (cheapest) to fetch the records.
Because the range needed looks like to be the complete table looking into the rows (EXPLAIN) and Cardinality. these numbers are equal.
If the MySQL optimizer did use an index the relative cost would be much higher because of the random reads (slow) what are necessary to look up the records
motto off the story an FULL TABLE SCAN in this case is not the end off the world... just accept it..
Related
how can have my query be able to not differienciate between my lower and upper case column query but without using the lower method of sql?
select * from
User
where
lower(lusername)= 'abdel'
i want to still be able to use my column name with any type of case (Abc, aBc, ABC,aBC..) but without using the lower function in my query sql because the lower disable my index that i put on my column name.
is there a way to do that ? thank you in advance
The default collation used in MySQL is case-insensitive. Therefore you don't need to use LOWER() if you are doing searching.
In other words, the following two queries should return the same results:
SELECT ... FROM mytable WHERE LOWER(name) = 'Abc';
SELECT ... FROM mytable WHERE name = 'Abc'
The latter query is able to optimize the query with an index, but the former can't.
Any expression or function call makes the search not "sargable" which means it can't use an index, because the optimizer can't know if that expression or function outputs have the same sort order as the index on the column.
Based on your comment below, I see you are using a collation utf8_bin, which is case-sensitive. This does improve performance a little bit over using a collation, so if performance is the only thing that's important in this case, I understand why you want to keep using that collation.
Here's a demo of using EXPLAIN to verify that it uses an index when you compare strings directly, but the binary collation does a case-sensitive search:
mysql> create table User (id serial primary key, lusername varchar(75) collate utf8_bin, created datetime, key(lusername));
mysql> explain select * from User where lusername = 'Abc';
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-------+
| 1 | SIMPLE | User | NULL | ref | lusername | lusername | 228 | const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-------+
But when we use a function like LOWER(), it can't use the index, and resorts to a table-scan:
mysql> explain select * from User where LOWER(lusername) = 'Abc';
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | User | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
However, MySQL 5.7 has a feature to add a virtual column based on the expression, and index the virtual column:
mysql> alter table User add lower_lusername varchar(75) as (LOWER(lusername)), add key (lower_lusername);
mysql> explain select * from User where LOWER(lusername) = 'Abc';
+----+-------------+-------+------------+------+-----------------+-----------------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+-----------------+-----------------+---------+-------+------+----------+-------+
| 1 | SIMPLE | User | NULL | ref | lower_lusername | lower_lusername | 303 | const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+-----------------+-----------------+---------+-------+------+----------+-------+
Alternatively, in MySQL 8.0, you can now get the index on the expression it without even adding a generated column:
mysql> alter table User add key ((LOWER(lusername)));
mysql> explain select * from User where LOWER(lusername) = 'Abc';
+----+-------------+-------+------------+------+------------------+------------------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+------------------+------------------+---------+-------+------+----------+-------+
| 1 | SIMPLE | User | NULL | ref | functional_index | functional_index | 228 | const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+------------------+------------------+---------+-------+------+----------+-------+
I now have a table like this:
> DESC userInfo;
+--------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+---------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | char(32) | NO | MUL | NULL | |
| age | tinyint(3) unsigned | NO | | NULL | |
| gender | tinyint(1) | NO | | 1 | |
+--------+---------------------+------+-----+---------+----------------+
I made (name, age) a joint unique index:
> SHOW INDEX FROM userInfo;
+----------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+--------------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+--------------------+
| userInfo | 0 | PRIMARY | 1 | id | A | 0 | NULL | NULL | | BTREE | | |
| userInfo | 0 | joint_unique_index | 1 | name | A | 0 | NULL | NULL | | BTREE | | 联合唯一索引 |
| userInfo | 0 | joint_unique_index | 2 | age | A | 0 | NULL | NULL | | BTREE | | 联合唯一索引 |
+----------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+--------------------+
3 rows in set (0.00 sec)
Now, when I use the following query statement, its type is All:
> DESC SELECT * FROM userInfo WHERE age = 18;
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | userInfo | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
I can understand this behavior, because according to the leftmost prefix matching feature, age will not be used as an index column when querying.
But when I use the following statement to query, its type is Index:
> DESC SELECT name, age FROM userInfo WHERE age = 18;
+----+-------------+----------+------------+-------+---------------+--------------------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+---------------+--------------------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | userInfo | NULL | index | NULL | joint_unique_index | 132 | NULL | 1 | 100.00 | Using where; Using index |
+----+-------------+----------+------------+-------+---------------+--------------------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
I can't understand how this result is produced. According to Example 1, the age as the query condition does not satisfy the leftmost prefix matching feature, but from the results, its type is actually Index! Is this an optimization in MySQL?
When I try to make sure I use indexed columns as query conditions, their type is always ref, as shown below:
> DESC SELECT * FROM userInfo WHERE name = "Jack";
+----+-------------+----------+------------+------+--------------------+--------------------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+--------------------+--------------------+---------+-------+------+----------+-------+
| 1 | SIMPLE | userInfo | NULL | ref | joint_unique_index | joint_unique_index | 128 | const | 1 | 100.00 | NULL |
+----+-------------+----------+------------+------+--------------------+--------------------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
> DESC SELECT name, age FROM userInfo WHERE name = "Jack";
+----+-------------+----------+------------+------+--------------------+--------------------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+--------------------+--------------------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | userInfo | NULL | ref | joint_unique_index | joint_unique_index | 128 | const | 1 | 100.00 | Using index |
+----+-------------+----------+------------+------+--------------------+--------------------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
Please tell me why when I use age as a query, the first result is ALL, but the second result is INDEX. Is this the result of MySQL optimization?
In other words, when SELECT * is used, index column queries are not applied, but when SELECT joint_col1, joint_col2 FROM joint_col2 are used, index column queries (because type is INDEX) are used. Why does this difference occur?
Simplifying a bit, an index (name, age) is basically the same as if you had another table (name, age, id) with a copy of those values. The primary key is (for InnoDB) included for technical reasons - MySQL uses it to find the full row in the original table.
So you can basically think of it as if you have 2 tables: (id, name, age, gender) and (name, age, id), both with the same amount of rows. And both have the ability to jump to/skip specific rows if you provide the leftmost columns.
If you do
SELECT * FROM userInfo WHERE age = 18;
MySQL has to read, as you expected, every row of the table, as there is no way to find rows with age = 18 faster - just as you concluded, there is no index with age as the leftmost column.
If you do
SELECT name, age FROM userInfo WHERE age = 18;
the situation doesn't change a lot: MySQL will also have to read every row, and still cannot use the index on (name, age) to limit the number of rows it has to read.
But MySQL can use a trick: since you only need the columns name and age, it can read all rows from the index-"table" and still have all information it needs, as the index is a covering index (it covers all required columns).
Why would MySQL do that? Because it has to read less absolute data than reading the complete table: the index stores the information you want in less bytes (as it doesn't include gender). Reading less data to get all the information you need is better/faster than reading more data to get the same information. So MySQL will do just that.
But to emphasize it: your query still has to read all rows, it is still basically a full table scan ("ALL") - just on a "table" (the index) with less columns, to save some bytes. While you won't notice a difference with one tinyint column, if your table has a lot of or large columns, it's actually a relevant speedup.
The "leftmost" rule applies to the WHERE clause versus the INDEX.
INDEX(name, age) is useful for WHERE name = '...' or WHERE name = '...' AND ((anything else)) because name is leftmost in the index.
What you have is WHERE age = ... ((and nothing else)), so you need INDEX(age) (or INDEX(age, ...)).
In particular, SELECT name, age FROM userInfo WHERE age = 18;:
INDEX(age) -- good
INDEX(age, name) -- better because it is "covering".
The order of columns in the WHERE does not matter; the order in the INDEX does matter.
First, I create a simple database with one MyISAM table with an indexed field called feature.
CREATE DATABASE test;
USE test;
CREATE TABLE data(
id INT(11) NOT NULL AUTO_INCREMENT,
feature VARCHAR(64),
PRIMARY KEY (id)
) ENGINE=MyISAM;
INSERT INTO data VALUES (1, 'a'), (2, 'b');
CREATE INDEX data_feature ON data(feature);
Then, when I test a GROUP BY query with a count, it doesn't use the index when the COUNT() is made by id (see Extra column at the end of the EXPLAIN).
mysql> EXPLAIN SELECT feature, COUNT(1) FROM data GROUP BY feature;
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| 1 | SIMPLE | data | NULL | index | data_feature | data_feature | 259 | NULL | 2 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT feature, COUNT(*) FROM data GROUP BY feature;
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| 1 | SIMPLE | data | NULL | index | data_feature | data_feature | 259 | NULL | 2 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT feature, COUNT(id) FROM data GROUP BY feature;
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------+
| 1 | SIMPLE | data | NULL | index | data_feature | data_feature | 259 | NULL | 2 | 100.00 | NULL |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
I have tested it on MySQL Community 8.0.21 and MariaDB 10.3.25.
COUNT(id) obligates the Optimizer to check id for being NOT NULL. The standard pattern is to simply say COUNT(*).
InnoDB probably works differently. But this is because the PRIMARY KEY is implicitly tacked onto the end of any secondary index. That makes INDEX(feature) work like INDEX(feature, id), in which case it would be a "covering" index as indicated by "Using index".
(There is virtually no reason to stick with MyISAM in this decade.)
From my previous post I figured out that if I refer multiple columns in a select query I need a compound index, so for my table
CREATE TABLE price (
dt TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
marketId INT,
buy DOUBLE,
sell DOUBLE,
PRIMARY KEY (dt, marketId),
FOREIGN KEY fk_price_market(marketId) REFERENCES market(id) ON UPDATE CASCADE ON DELETE CASCADE
) ENGINE=INNODB;
I created the compound index:
CREATE INDEX idx_price_market_buy ON price (marketId, buy, sell, dt);
now the query
select max(dt) from price where marketId=309 and buy>0.3;
executes fast enough within 0.02 sec, but a similar query with the same combination of columns
select max(buy) from price where marketId=309 and dt>'2019-10-29 15:00:00';
takes 0.18 sec that is relatively slow.
descs of these queries look a bit different:
mysql> desc select max(dt) from price where marketId=309 and buy>0.3;
+----+-------------+-------+------------+-------+-----------------------------------------------------+----------------------+---------+------+-------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+-----------------------------------------------------+----------------------+---------+------+-------+----------+--------------------------+
| 1 | SIMPLE | price | NULL | range | idx_price_market,idx_price_buy,idx_price_market_buy | idx_price_market_buy | 13 | NULL | 50442 | 100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+-----------------------------------------------------+----------------------+---------+------+-------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> desc select max(buy) from price where marketId=309 and dt>'2019-10-29 15:00:00';
+----+-------------+-------+------------+------+-----------------------------------------------+----------------------+---------+-------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+-----------------------------------------------+----------------------+---------+-------+--------+----------+--------------------------+
| 1 | SIMPLE | price | NULL | ref | PRIMARY,idx_price_market,idx_price_market_buy | idx_price_market_buy | 4 | const | 202176 | 50.00 | Using where; Using index |
+----+-------------+-------+------------+------+-----------------------------------------------+----------------------+---------+-------+--------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
for example, key_len differs. What does this mean?
And the main question: what is the difference between buy and dt columns? Why switching them places in the query affects the performance?
Table has 1 500 000 records, 1 250 000 of them have field = 'z'.
I need select random not 'z' field.
$random = mt_rand(1, 250000);
$query = "SELECT field FROM table WHERE field != 'z' LIMIT $random, 1";
It is working ok.
Then I decided to optimize it and indexed field in table.
Result was strange - it was slower ~3 times. I tested it.
Why it is slower? Is not such indexing should make it faster?
my ISAM
explain with index:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table range field field 758 NULL 1139287 Using
explain without index:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table ALL NULL NULL NULL NULL 1484672 Using where
Summary
The problem is that field is not a good candidate for indexing, due to the nature of b-trees.
Explanation
Let's suppose you have a table that has the results of 500,000 coin tosses, where the toss is either 1 (heads) or 0 (tails):
CREATE TABLE toss (
id int NOT NULL AUTO_INCREMENT,
result int NOT NULL DEFAULT '0',
PRIMARY KEY ( id )
)
select result, count(*) from toss group by result order by result;
+--------+----------+
| result | count(*) |
+--------+----------+
| 0 | 250290 |
| 1 | 249710 |
+--------+----------+
2 rows in set (0.40 sec)
If you want to select one toss (at random) where the toss was tails, then you need to search through your table, picking a random starting place.
select * from toss where result != 1 limit 123456, 1;
+--------+--------+
| id | result |
+--------+--------+
| 246700 | 0 |
+--------+--------+
1 row in set (0.06 sec)
explain select * from toss where result != 1 limit 123456, 1;
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | toss | ALL | NULL | NULL | NULL | NULL | 500000 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
You see that you're basically searching sequentially through all of the rows to find a match.
If you create an index on the toss field, then your index will contain two values, each with roughly 250,000 entries.
create index foo on toss ( result );
Query OK, 500000 rows affected (2.48 sec)
Records: 500000 Duplicates: 0 Warnings: 0
select * from toss where result != 1 limit 123456, 1;
+--------+--------+
| id | result |
+--------+--------+
| 246700 | 0 |
+--------+--------+
1 row in set (0.25 sec)
explain select * from toss where result != 1 limit 123456, 1;
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | toss | range | foo | foo | 4 | NULL | 154565 | Using where |
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+
Now you're searching fewer records, but the time to search increased from 0.06 to 0.25 seconds. Why? Because sequentially scanning an index is actually less efficient than sequentially scanning a table, for indexes with a large number of rows for a given key.
Let's look at the indexes on this table:
show index from toss;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| toss | 0 | PRIMARY | 1 | id | A | 500000 | NULL | NULL | | BTREE | |
| toss | 1 | foo | 1 | result | A | 2 | NULL | NULL | | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
The PRIMARY index is a good index: there are 500,000 rows, and there are 500,000 values. Arranged in a BTREE, you can quickly identify a single row based on the id.
The foo index is a bad index: there are 500,000 rows, but only 2 possible values. This is pretty much the worst possible case for a BTREE -- all of the overhead of searching the index, and still having to search through the results.
In the absence of an order by clause, that LIMIT $random, 1 starts at some undefined place.
And according to your explain, the index isn't even being used.