This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Disadvantages of quoting integers in a Mysql query?
I have a very simple table Called Device on MYSql database.
+-----------------------------------+--------------+------+-----+----------------+
| Field | Type | Null | Key | Extra |
+-----------------------------------+--------------+------+-----+----------------+
| DTYPE | varchar(31) | NO | | |
| id | bigint(20) | NO | PRI | auto_increment |
| dateCreated | datetime | NO | | |
| dateModified | datetime | NO | | |
| phoneNumber | varchar(255) | YES | MUL | |
| version | bigint(20) | NO | | |
| oldPhoneNumber | varchar(255) | YES | | |
+-----------------------------------+--------------+------+-----+----------------+
This table has more than 100K records. I am running a very simple query
select * from AttDevice where phoneNumber = 5107357058;
This query takes almost 4-6 second, But when I change this query a little bit as shown below.
select * from AttDevice where phoneNumber = '5107357058';
It takes almost no time to get executed.
Notice that phoneNumber column is varchar. I don't understand why the former case takes longer time and later doesn't. The difference between these two queries is the single quote.
Does MYSQL treats these to query differently if so then why?
EDIT 1
I used EXPLAIN and got the following output but don't know how to interpret these two results.
mysql> EXPLAIN select * from AttDevice where phoneNumber = 5107357058;
+----+-------------+-----------+------+---------------------------------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------------------------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | Device | ALL | phoneNumber,idx_Device_phoneNumber | NULL | NULL | NULL | 6482116 | Using where |
+----+-------------+-----------+------+---------------------------------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)
mysql> EXPLAIN select * from AttDevice where phoneNumber = '5107357058';
+----+-------------+-----------+------+---------------------------------------+-------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------------------------------+-------------+---------+-------+------+-------------+
| 1 | SIMPLE | Device | ref | phoneNumber,idx_Device_phoneNumber | phoneNumber | 258 | const | 2 | Using where |
+----+-------------+-----------+------+---------------------------------------+-------------+---------+-------+------+-------------+
1 row in set (0.00 sec)
Can someone explain me about the key, key_len and rows present in EXPLAIN query output?
1) Thank you for the "EXPLAIN". We all (including you, I'm sure) knew that the problem was that mysql had to convert the integer to a string, and had to do it for each row. But your "EXPLAIN" proved it.
2) Here's a nice, short article about EXPLAIN:
http://www.lornajane.net/posts/2011/explaining-mysqls-explain
The *possible_keys* shows which indexes apply to this query and the key
tells us which of those was actually used -... Finally the rows entry tell
us how many rows MySQL had to look at to find the result set.
Search value: key: type: ref: rows:
------------- --- ---- ---- ----
5107357058 NULL ALL NULL 6482116
'5107357058' phoneNumber ref const 2
3) The "ref" column is the "The columns compared to the index". In the second case, the string literal ("constant") '5107357058' was compared to the key "phoneNumber". In the first case, there was no usable key (because your search condition was a completely different type); hence "ref" was NULL.
4) The "type" column is "The join type". "Ref" means "All rows with matching index values are read from this table" (in this case, 2 rows). "ALL" mans "full table scan". Which in this case means 6 million rows.
5) Here's the mysql documentation for "EXPLAIN":
http://dev.mysql.com/doc/refman/5.5/en/explain-output.html
You fooled MySQL into making a bad choice by NOT quoting the phone number. Consider:
The column definition is varchar
In the first (unquoted) case you provided the value as an integer (long). I would have thought MySQL could figure this one out, but obviously it didn't, and did a full table scan.
In the second (quoted) case, you gave the search key in the correct datatype (character) and MySQL chose the index over the full-table-scan.
The varchar index cannot be used when you use a number as the operand, excerpt from the fine documentation on implicit type conversions:
For comparisons of a string column with a number, MySQL cannot use an index on the column to look up the value quickly. If str_col is an indexed string column, the index cannot be used when performing the lookup in the following statement:
SELECT * FROM tbl_name WHERE str_col=1;
The reason for this is that there are many different strings that may convert to the value 1, such as '1', ' 1', or '1a'.
I believe that MySQL has to convert the number into a varchar in the first example. In the second example it does not. I'm guessing that's where the difference is coming from.
The first example looks through the table one by one, the other one uses the index.
http://dev.mysql.com/doc/refman/5.0/en/show-columns.html
If Key is MUL, multiple occurrences of a given value are permitted within the column. The column is the first column of a nonunique index or a unique-valued index that can contain NULL values.
So instead of scanning all the null values, the second query look exclusively for for non-null values which speeds things up.
....I think.
Related
Low-level MySQL question for you: we are storing UUIDs as bin(16) in the DB and are using them as a primary key. We are in the process of changing the UUID stored to time-based. We want to optimize clustered index insertions to be append-only, but that will only work if we are sure of how MySQL adds the values to the b-tree. Does anyone know if MySQL b-tree for bin types uses the least or most significant bit first and then goes right-to-left or left-to-right respectively?
Binary string types are like other string types, except they have no character set. That is, the bytes are treated as literal byte values, with no encoding. Otherwise, they sort just like strings: left to right.
Try an experiment to test this:
create table mytable (id int primary key, b binary(16), key(b));
insert into mytable values
(1, unhex('BBBBBBBBBBBBBBBB7777777777777777')),
(2, unhex('7777777777777777BBBBBBBBBBBBBBBB'))
mysql> select id, hex(b) from mytable order by b;
+----+----------------------------------+
| id | hex(b) |
+----+----------------------------------+
| 2 | 7777777777777777BBBBBBBBBBBBBBBB |
| 1 | BBBBBBBBBBBBBBBB7777777777777777 |
+----+----------------------------------+
If the string were sorted by the least significant bit, the order would be opposite.
I used EXPLAIN to prove that this query uses the index on column b:
explain select id, hex(b) from mytable order by b;
+----+-------------+---------+------------+-------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+-------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | mytable | NULL | index | NULL | b | 17 | NULL | 2 | 100.00 | Using index |
+----+-------------+---------+------------+-------+---------------+------+---------+------+------+----------+-------------+
Yes, it is possible to make Type 1 UUIDs into BINARY(16) for 'trivial` indexing in virtually chronological order.
All ports of MySQL have identical ordering of the bits and bytes on disk. So this discussion is OS-independent.
Pre-8.0, see my UUID blog: UUIDs
8.0: See the uuid functions; they do essentially the same as mentioned in my blog
MariaDB 10.7 has a UUID datatype; no need for adding function calls. But they rearrange the bits differently.
A "select distinct col1,col2 from table1" where col1 and col2 are of type TEXT and table1 has about 65K rows works fine with MySQL 5.5.58. Now that I've upgraded to MySQL 5.7.20 it takes almost an hour! Does anyone know of any changes to MySQL that may be causing this? Does anyone have any suggestions how col1 and col2 should be optimally indexed for this query, or what other settings I should check to make this query run faster? I don't get the feeling that indexes are even being used since EXPLAIN says it's using a temporary table and no keys:
mysql> `
explain SELECT DISTINCT author,sort_author from itemsbyauthor;
+----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
| 1 | SIMPLE | itemsbyauthor | NULL | ALL | NULL | NULL | NULL | NULL | 64727 | 100.00 | Using temporary |
+----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
1 row in set, 1 warning (0.00 sec)
In many cases, MySQL doesn't use prefix indexes properly and it seems this is one of these cases.
Do you really need the column type to be TEXT?
From the column names, it looks like the columns are holding author names, which seems like a relatively short string (let's say, up to 50 or 100 characters)?
I would re-consider the column type and try to alter it to VARCHAR with a fixed size, instead of TEXT.
Then, add a compound index that includes both columns.
I was expecting this query to use a key.
mysql> DESCRIBE TABLE Foo;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| name | varchar(50) | NO | UNI | NULL | |
+-------+-------------+------+-----+---------+----------------+
mysql> EXPLAIN SELECT id FROM Foo WHERE name='foo';
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
| 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables |
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
Foo has a unique index on name, so why isn't the index being used in the SELECT?
From the MySQL Manual page entitled EXPLAIN Output Format:
Impossible WHERE noticed after reading const tables (JSON property:
message)
MySQL has read all const (and system) tables and notice that the WHERE
clause is always false.
and the definition of const tables, from the Page entitled Constants and Constant Tables:
A MySQL constant is something more than a mere literal in the query.
It can also be the contents of a constant table, which is defined as
follows:
A table with zero rows, or with only one row
A table expression that is restricted with a WHERE condition,
containing expressions of the form column = constant, for all the
columns of the table's primary key, or for all the columns of any of
the table's unique keys (provided that the unique columns are also
defined as NOT NULL).
The second reference is a page and half long. Please refer to it.
const
const
The table has at most one matching row, which is read at the start of
the query. Because there is only one row, values from the column in
this row can be regarded as constants by the rest of the optimizer.
const tables are very fast because they are read only once.
const is used when you compare all parts of a PRIMARY KEY or UNIQUE
index to constant values. In the following queries, tbl_name can be
used as a const table:
SELECT * FROM tbl_name WHERE primary_key=1;
SELECT * FROM tbl_name WHERE primary_key_part1=1 AND
primary_key_part2=2;
It could be because that the said table Foo very less volume of data. In such case optimizer will choose to do table scan rather than looking through index.
As MySQL Documentation clearly says
Indexes are less important for queries on small tables, or big tables
where report queries process most or all of the rows. When a query
needs to access most of the rows, reading sequentially is faster than
working through an index. Sequential reads minimize disk seeks, even
if not all the rows are needed for the query.
I'm new to query optimizations so I accept I don't understand everything yet but I do not understand why even this simple query isn't optimized as expected.
My table:
+------------------+-----------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+-----------+------+-----+-------------------+----------------+
| tasktransitionid | int(11) | NO | PRI | NULL | auto_increment |
| taskid | int(11) | NO | MUL | NULL | |
| transitiondate | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
+------------------+-----------+------+-----+-------------------+----------------+
My indexes:
+-----------------+------------+-------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+-------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| tasktransitions | 0 | PRIMARY | 1 | tasktransitionid | A | 952 | NULL | NULL | | BTREE | | |
| tasktransitions | 1 | transitiondate_ix | 1 | transitiondate | A | 952 | NULL | NULL | | BTREE | | |
+-----------------+------------+-------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
My query:
SELECT taskid FROM tasktransitions WHERE transitiondate>'2013-09-31 00:00:00';
gives this:
+----+-------------+-----------------+------+-------------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+-------------------+------+---------+------+------+-------------+
| 1 | SIMPLE | tasktransitions | ALL | transitiondate_ix | NULL | NULL | NULL | 1082 | Using where |
+----+-------------+-----------------+------+-------------------+------+---------+------+------+-------------+
If I understand everything correctly Using where and ALL means that all rows are retrieved from the storage engine and filtered at server layer. This is sub-optimal. Why does it refuse to use the index and only retrieve the requested range from the storage engine (innoDB)?
Cheers
MySQL will not use the index if it estimates that it would select a significantly large portion of the table, and it thinks that a table-scan is actually more efficient in those cases.
By analogy, this is the reason the index of a book doesn't contain very common words like "the" -- because it would be a waste of time to look up the word in the index and find the list of page numbers is a very long list, even every page in the book. It would be more efficient to simply read the book cover to cover.
My experience is that this happens in MySQL if a query's search criteria would match greater than 20% of the table, and this is usually the right crossover point. There could be some variation based on the data types, size of table, etc.
You can give a hint to MySQL to convince it that a table-scan would be prohibitively expensive, so it would be much more likely to use the index. This is not usually necessary, but you can do it like this:
SELECT taskid FROM tasktransitions FORCE INDEX (transitiondate_ix)
WHERE transitiondate>'2013-09-31 00:00:00';
I once was trying to join two tables and MySQL was refusing to use an index, resulting in >500ms queries, sometimes a few seconds. Turns out the column I was joining on had different encodings on each table. Changing both to the same encoding sped up the query to consistently less than 100ms.
Just in case, it helps somebody.
I have a table with a varchar column _id (long int coded as string). I added an index for this column, but query was still slow. I was executing this query:
select * from table where (_id = 2221835089) limit 1
I realized that the _id column wasn't been generated as string (I'm Laravel as DB framework). Well, if query is executed with the right data type in the where clause everything worked like a charm:
select * from table where (_id = '2221835089') limit 1
I am new at my MySQL 8.0, have finished 2 simple tutorials completely, and there is only two subjects that has not worked for me, one of them is indexing. I read the section labeled "2 Answers" and found that using
the statement suggested at the end of said section, seems to defeat the
purpose of the original USE INDEX or FORCE INDEX statement below. The suggested statement is like getting a table sorted via a WHERE statement instead of MySQL using USE INDEX or FORCE INDEX. It works, but seems to me it is not the same as using the natural USE INDEX or FORCE INDEX. Does any one knows why MySQL is ignoring my simple request to index a 10 row table on the Lname column?
Field
Type
Null
Key
Default
Extra
ID
int
NO
PRI
Null
auto_increment
Lname
varchar(20)
NO
MUL
Null
Fname
varchar(20)
NO
Mul
Null
City
varchar(15)
NO
Null
Birth_Date
date
NO
Null
CREATE INDEX idx_Lname ON TestTable (Lname);
SELECT * FROM TestTable USE INDEX (idx_Lname);
SELECT * From Testtable FORCE INDEX (idx_LastFirst);
Table structure:
+-------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+----------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| total | int(11) | YES | | NULL | |
| thedatetime | datetime | YES | MUL | NULL | |
+-------------+----------+------+-----+---------+----------------+
Total rows: 137967
mysql> explain select * from out where thedatetime <= NOW();
+----+-------------+-------------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | out | ALL | thedatetime | NULL | NULL | NULL | 137967 | Using where |
+----+-------------+-------------+------+---------------+------+---------+------+--------+-------------+
The real query is much more longer with more table joins, the point is, I can't get the table to use the datetime index. This is going to be hard for me if I want to select all data until certain date. However, I noticed that I can get MySQL to use the index if I select a smaller subset of data.
mysql> explain select * from out where thedatetime <= '2008-01-01';
+----+-------------+-------------+-------+---------------+-------------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+---------------+-------------+---------+------+-------+-------------+
| 1 | SIMPLE | out | range | thedatetime | thedatetime | 9 | NULL | 15826 | Using where |
+----+-------------+-------------+-------+---------------+-------------+---------+------+-------+-------------+
mysql> select count(*) from out where thedatetime <= '2008-01-01';
+----------+
| count(*) |
+----------+
| 15990 |
+----------+
So, what can I do to make sure MySQL will use the index no matter what date that I put?
There are two things in play here -
Index is not selective enough - if the index covers more than approx. 30% of the rows, MySQL will decide a full table scan is more efficient. When you contract the range the index kicks in.
One index per table in a join
The real query is much more longer
with more table joins, the point is ...
The point is exactly because it has joins that it probably can't use that index. MySQL can use one index per table in a join (unless it qualifies for an index-merge optimization). If the primary key is already used for the join, thedatetime won't be used. In order to use it, you need to create a multi-column index on the join key + thedatetime index, in the correct order.
Check the EXPLAIN of the actual query to see which key MySQL uses for the join. Modify that index to include the thedatetime column as well, or create a new multi-column index from both (depending on what you use the join key for).
Everything works as it is supposed to. :)
Indexes are there to speed up retrieval. They do it using index lookups.
In you first query the index is not used because you are retrieving ALL rows, and in this case using index is slower (lookup index, get row, lookup index, get row... x number of rows is slower then get all rows == table scan)
In the second query you are retrieving only a portion of the data and in this case table scan is much slower.
The job of the optimizer is to use statistics that RDBMS keeps on the index to determine the best plan. In first case index was considered, but planner (correctly) threw it away.
EDIT
You might want to read something like this to get some concepts and keywords regarding mysql query planner.