Could I cast VARCHAR to INT without broking INDEX? - mysql

Need to search through table foo
foo structure is
id | something
There is an INDEX for field something
I want to search BETWEEN AS INT:
SELECT CAST(something as INT) as something_int FROM foo foo_1
WHERE something_int > 1 AND something_int < 9999
In this case the INDEX would be used or be broken?

WHERE some_varchar BETWEEN '1' AND '2000' -- fast but probably incorrect
WHERE some_varchar BETWEEN 1 AND 2000 -- slow but correct
WHERE some_int BETWEEN '1' AND '2000' -- fast
WHERE some_int BETWEEN 1 AND 2000 -- fast (same as previous)
What is happening?
When comparing text to numeric, the text side is converted to numeric, and then numeric comparisons are performed.
Text to text comparison does a string comparison; numeric to numeric does a numeric comparison.
Above, I say "slow" meaning that no index can be used; "fast" if an index can be used.
The "incorrect" one has the same problem as sorting a set of numbers in a VARCHAR and then wondering why the list is out of order: 1,10,11,...,19,2,20,...,29,3, ...
CAST() is just an explicit version of the implicit conversion I am talking about here.
CAST('2000' TO INT) is done "at compile time", so the Optimizer sees it as simply 2000 (numeric, no function call).
some_varchar >= 1 on the other hand, is turned into CAST(some_varchar TO INT) >= 1 in the second example above.
As a Rule of Thumb, "hiding a column in a function call prevents using an index. See "sargable" in Wikipedia.

No, the index will not be used.
CREATE TABLE foo(something varchar(20) primary key) engine=myisam;
INSERT INTO foo VALUES ('1|abc'), ('3456|def');
DESCRIBE SELECT * FROM foo WHERE CAST(something as INT) BETWEEN 1 AND 2000;
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | tt | index | NULL | PRIMARY | 82 | NULL | 2 | Using where; Using index |
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
DESCRIBE SELECT * FROM foo WHERE something BETWEEN '1' AND '2000';
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | tt | range | PRIMARY | PRIMARY | 82 | NULL | 1 | Using where; Using index |
+------+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
Notice how possible_keys is NULL for the first query (and rows found is 2).
Note: this can happen even if the character set of the query doesn't match that of the index.
Create a separate INT index (e.g. using the function index syntax).

Related

MySQL: How are bin() columns indexed?

Low-level MySQL question for you: we are storing UUIDs as bin(16) in the DB and are using them as a primary key. We are in the process of changing the UUID stored to time-based. We want to optimize clustered index insertions to be append-only, but that will only work if we are sure of how MySQL adds the values to the b-tree. Does anyone know if MySQL b-tree for bin types uses the least or most significant bit first and then goes right-to-left or left-to-right respectively?
Binary string types are like other string types, except they have no character set. That is, the bytes are treated as literal byte values, with no encoding. Otherwise, they sort just like strings: left to right.
Try an experiment to test this:
create table mytable (id int primary key, b binary(16), key(b));
insert into mytable values
(1, unhex('BBBBBBBBBBBBBBBB7777777777777777')),
(2, unhex('7777777777777777BBBBBBBBBBBBBBBB'))
mysql> select id, hex(b) from mytable order by b;
+----+----------------------------------+
| id | hex(b) |
+----+----------------------------------+
| 2 | 7777777777777777BBBBBBBBBBBBBBBB |
| 1 | BBBBBBBBBBBBBBBB7777777777777777 |
+----+----------------------------------+
If the string were sorted by the least significant bit, the order would be opposite.
I used EXPLAIN to prove that this query uses the index on column b:
explain select id, hex(b) from mytable order by b;
+----+-------------+---------+------------+-------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+-------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | mytable | NULL | index | NULL | b | 17 | NULL | 2 | 100.00 | Using index |
+----+-------------+---------+------------+-------+---------------+------+---------+------+------+----------+-------------+
Yes, it is possible to make Type 1 UUIDs into BINARY(16) for 'trivial` indexing in virtually chronological order.
All ports of MySQL have identical ordering of the bits and bytes on disk. So this discussion is OS-independent.
Pre-8.0, see my UUID blog: UUIDs
8.0: See the uuid functions; they do essentially the same as mentioned in my blog
MariaDB 10.7 has a UUID datatype; no need for adding function calls. But they rearrange the bits differently.

Why does an indexed mysql query filtered on less char values result in more rows examined?

When I run the following query, I see the expected rows examined as 40
EXPLAIN SELECT s.* FROM subscription s
WHERE s.current_period_end_date <= NOW()
AND s.status in ('A', 'T')
AND s.period_end_action in ('R','C')
ORDER BY s._id ASC limit 20;
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | s | index | status,current_period_end_date | PRIMARY | 4 | NULL | 40 | Using where |
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
But when I run this query that simply changes AND s.period_end_action in ('R','C') to AND s.period_end_action = 'C', I see the expected rows examined as 611
EXPLAIN SELECT s.* FROM subscription s
WHERE s.current_period_end_date <= NOW()
AND s.status in ('A', 'T')
AND s.period_end_action = 'C'
ORDER BY s._id ASC limit 20;
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | s | index | status,current_period_end_date | PRIMARY | 4 | NULL | 611 | Using where |
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
I have the following indexes on the subscription table:
_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
INDEX(status, period_end_action),
INDEX(current_period_end_date),
Any ideas? I don't understand why removing one of the period_end_action values would cause such a large increase in rows examined?
(I agree with others that EXPLAIN often has terrible row estimates.)
Actually the numbers might be reasonable (though I doubt it). The optimizer decided to do a table scan in both cases. And the query with fewer options for period_end_action probably has to scan farther to get the 20 rows. This is because it punted on using either of your secondary indexes.
These indexes are more likely to help your second query:
INDEX(period_end_action, _id)
INDEX(period_end_action, status)
INDEX(period_end_action, current_period_end_date)
The optimal index is usually starts with any columns tested by =.
Since there is no such thing for your first query, the Optimizer probably decided to scan in _id order so that it could avoid the "sort" mandated by ORDER BY.

how to optimize SQL queries using IN having indexed keys

I am running this query
EXPLAIN SELECT id, timestamp from foo where id IN (23,67,78,90) order by ASC
here id is indexed. But then too when I am running Explain I am getting this in Using where;Using Index in Extra
+----+-------------+---------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | foo | range | PRIMARY | PRIMARY | 8 | NULL | 12 | Using where;Using Index|
+----+-------------+---------------+-------+---------------+---------+---------+------+------+-------------+
But when I am running this same query with single id nothing is in Extra its working as expected in the case of index
EXPLAIN SELECT id, timestamp from foo where id = 23`
+----+-------------+---------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table| type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | foo | range | PRIMARY | PRIMARY | 8 | NULL | 1 | |
+----+-------------+---------------+-------+---------------+---------+---------+------+------+-------------+
I think something wrong with IN. Can anyone tell me the way to optimize it ?
As per my knowledge,
IN query will take more time than Single "=". If "IN" have single value then it is equal to single "=" query.
Because it is MULTIPLE OR conditions of "=" OR "=".
id will match with every 4 values in "IN" array.
Only way to optimise is to have index.
Update:
If the Extra column also says Using where, it means the index is being used to perform lookups of key values. Without Using where, the optimizer may be reading the index to avoid reading data rows but not using it for lookups. For example, if the index is a covering index for the query, the optimizer may scan it without using it for lookups. See explanation

Is index dependent on selected columns?

I am executing most of the queries based on the time. So i created index for the created time. But , The index only works , If I select the indexed columns only. Is mysql index is dependant the selected columns?.
My Assumption On Index
I thought index is like a telephone dictionary index page. Ex: If i want to find "Mark" . Index page shows which page character "M" starts in the directory. I think as same as the mysql works.
Table
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment |
| Name | varchar(100) | YES | | NULL | |
| OPERATION | varchar(100) | YES | | NULL | |
| PID | int(11) | YES | | NULL | |
| CREATED_TIME | bigint(20) | YES | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
Indexes On the table.
+-----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| IndexTest | 0 | PRIMARY | 1 | ID | A | 10261 | NULL | NULL | | BTREE | | |
| IndexTest | 1 | t_dx | 1 | CREATED_TIME | A | 410 | NULL | NULL | YES | BTREE | | |
+-----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Queries Using Indexes:
explain select * from IndexTest where ID < 5;
+----+-------------+-----------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | IndexTest | range | PRIMARY | PRIMARY | 4 | NULL | 4 | Using where |
+----+-------------+-----------+-------+---------------+---------+---------+------+------+-------------+
explain select CREATED_TIME from IndexTest where CREATED_TIME > UNIX_TIMESTAMP(CURRENT_DATE())*1000;
+----+-------------+-----------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+------+---------+------+------+--------------------------+
| 1 | SIMPLE | IndexTest | range | t_dx | t_dx | 9 | NULL | 5248 | Using where; Using index |
+----+-------------+-----------+-------+---------------+------+---------+------+------+--------------------------+
Queries Not using Indexes
explain select count(distinct(PID)) from IndexTest where CREATED_TIME > UNIX_TIMESTAMP(CURRENT_DATE())*1000;
+----+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | IndexTest | ALL | t_dx | NULL | NULL | NULL | 10261 | Using where |
+----+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
explain select PID from IndexTest where CREATED_TIME > UNIX_TIMESTAMP(CURRENT_DATE())*1000;
+----+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | IndexTest | ALL | t_dx | NULL | NULL | NULL | 10261 | Using where |
+----+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
Short answer: No.
Whether indexes are used depends on the expresion in your WHERE clause, JOINs etc, but not on the columns you select.
But no rule without an exception (or actually a long list of those):
Long answer: Usually not
There are a number of factors used by the MySQL Optimizer in order to determine whether it should use an index.
The optimizer may decide to ignore an index if...
another (otherwise non-optimal) saves it from accessing the table data at all
it fails to understand that an expression is a constant
its estimates suggest it will return the full table anyway
if its use will cause the creation of a temporary file
... and tons of other reasons, some of which seem not to be documented anywhere
Sometimes the choices made by said optimizer are... erm... lets call them sub-optimal. Now what do you do in those cases?
You can help the optimizer by doing an OPTIMIZE TABLE and/or ANALYZE TABLE. That is easy to do, and sometimes helps.
You can make it use a certain index with the USE INDEX(indexname) or FORCE INDEX(indexname) syntax
You can make it ignore a certain index with the IGNORE INDEX(indexname) syntax
More details on Index Hints, Optimize Table and Analyze Table on the MySQL documentation website.
Actually, it makes no difference wether you select the column or not. Indexes are used for lookups, meaning for reducing really fast the number of records you need retrieved. That makes it usually useful in situations where: you have joins, you have where conditions. Also indexes help alot in ordering.
Updating and deleting can be sped up quite alot using indexes on the where conditions as well.
As an example:
table: id int pk ai, col1 ... indexed, col2 ...
select * from table -> does not use a index
select id from table where col1 = something -> uses the col1 index although it is not selected.
Looking at the second query, mysql does a lookup in the index, locates the records, then in this case stops and delivers (both id and col1 have index and id happens to be pk, so no need for a secondary lookup).
Situation changes a little in this case:
select col2 from table where col1 = something
This will make internally 2 lookups: 1 for the condition, and 1 on the pk for delivering the col2 data. Please notice that again, you don't need to select the col1 column to use the index.
Getting back to your query, the problem lies with: UNIX_TIMESTAMP(CURRENT_DATE())*1000;
If you remove that, your index will be used for lookups.
Is mysql index is dependant the selected columns?.
Yes, absolutely.
For example:
MySQL cannot use the index to perform lookups if the columns do not form a leftmost
prefix of the index. Suppose that you have the SELECT statements shown here:
SELECT * FROM tbl_name WHERE col1=val1;
SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;
SELECT * FROM tbl_name WHERE col2=val2;
SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3;
If an index exists on (col1, col2, col3), only the first two queries use the index.
The third and fourth queries do involve indexed columns, but (col2) and (col2, col3)
are not leftmost prefixes of (col1, col2, col3).
Have a read through the extensive documentation.
for mysql query , the answer is yes, but not all
the query:
explain select * from IndexTest where ID < 5;
use the table cluster index if you use innodb, its table's primary key, so it use primary for query
the second query:
select CREATED_TIME from IndexTest where CREATED_TIME >
UNIX_TIMESTAMP(CURRENT_DATE())*1000;
this one is just fetch the index column that mysql does not need to fetch data from table but just index, so your explain result got "Using Index"
the query:
select count(distinct(PID)) from IndexTest where CREATED_TIME >
UNIX_TIMESTAMP(CURRENT_DATE())*1000;
it look like this
select PID from IndexTest where
CREATE_TIME>UNIX_TIMESTAMP(CURRENT_DATE())*1000 group by PID
mysql can use index to fetch data from database also, but mysql thinks this query it no need to use index to fetch data, because of the where condition filter, mysql thinks that use index fetch data is more expensive than scan all table, you can use force index also
the same reason for your last query
hopp this answer can help you
indexing helps speed the search for that particular column and associated data rather than the table data. So you have to include the indexed column to speed up select.

simple SQL statement takes longer time to execute [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Disadvantages of quoting integers in a Mysql query?
I have a very simple table Called Device on MYSql database.
+-----------------------------------+--------------+------+-----+----------------+
| Field | Type | Null | Key | Extra |
+-----------------------------------+--------------+------+-----+----------------+
| DTYPE | varchar(31) | NO | | |
| id | bigint(20) | NO | PRI | auto_increment |
| dateCreated | datetime | NO | | |
| dateModified | datetime | NO | | |
| phoneNumber | varchar(255) | YES | MUL | |
| version | bigint(20) | NO | | |
| oldPhoneNumber | varchar(255) | YES | | |
+-----------------------------------+--------------+------+-----+----------------+
This table has more than 100K records. I am running a very simple query
select * from AttDevice where phoneNumber = 5107357058;
This query takes almost 4-6 second, But when I change this query a little bit as shown below.
select * from AttDevice where phoneNumber = '5107357058';
It takes almost no time to get executed.
Notice that phoneNumber column is varchar. I don't understand why the former case takes longer time and later doesn't. The difference between these two queries is the single quote.
Does MYSQL treats these to query differently if so then why?
EDIT 1
I used EXPLAIN and got the following output but don't know how to interpret these two results.
mysql> EXPLAIN select * from AttDevice where phoneNumber = 5107357058;
+----+-------------+-----------+------+---------------------------------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------------------------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | Device | ALL | phoneNumber,idx_Device_phoneNumber | NULL | NULL | NULL | 6482116 | Using where |
+----+-------------+-----------+------+---------------------------------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)
mysql> EXPLAIN select * from AttDevice where phoneNumber = '5107357058';
+----+-------------+-----------+------+---------------------------------------+-------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------------------------------+-------------+---------+-------+------+-------------+
| 1 | SIMPLE | Device | ref | phoneNumber,idx_Device_phoneNumber | phoneNumber | 258 | const | 2 | Using where |
+----+-------------+-----------+------+---------------------------------------+-------------+---------+-------+------+-------------+
1 row in set (0.00 sec)
Can someone explain me about the key, key_len and rows present in EXPLAIN query output?
1) Thank you for the "EXPLAIN". We all (including you, I'm sure) knew that the problem was that mysql had to convert the integer to a string, and had to do it for each row. But your "EXPLAIN" proved it.
2) Here's a nice, short article about EXPLAIN:
http://www.lornajane.net/posts/2011/explaining-mysqls-explain
The *possible_keys* shows which indexes apply to this query and the key
tells us which of those was actually used -... Finally the rows entry tell
us how many rows MySQL had to look at to find the result set.
Search value: key: type: ref: rows:
------------- --- ---- ---- ----
5107357058 NULL ALL NULL 6482116
'5107357058' phoneNumber ref const 2
3) The "ref" column is the "The columns compared to the index". In the second case, the string literal ("constant") '5107357058' was compared to the key "phoneNumber". In the first case, there was no usable key (because your search condition was a completely different type); hence "ref" was NULL.
4) The "type" column is "The join type". "Ref" means "All rows with matching index values are read from this table" (in this case, 2 rows). "ALL" mans "full table scan". Which in this case means 6 million rows.
5) Here's the mysql documentation for "EXPLAIN":
http://dev.mysql.com/doc/refman/5.5/en/explain-output.html
You fooled MySQL into making a bad choice by NOT quoting the phone number. Consider:
The column definition is varchar
In the first (unquoted) case you provided the value as an integer (long). I would have thought MySQL could figure this one out, but obviously it didn't, and did a full table scan.
In the second (quoted) case, you gave the search key in the correct datatype (character) and MySQL chose the index over the full-table-scan.
The varchar index cannot be used when you use a number as the operand, excerpt from the fine documentation on implicit type conversions:
For comparisons of a string column with a number, MySQL cannot use an index on the column to look up the value quickly. If str_col is an indexed string column, the index cannot be used when performing the lookup in the following statement:
SELECT * FROM tbl_name WHERE str_col=1;
The reason for this is that there are many different strings that may convert to the value 1, such as '1', ' 1', or '1a'.
I believe that MySQL has to convert the number into a varchar in the first example. In the second example it does not. I'm guessing that's where the difference is coming from.
The first example looks through the table one by one, the other one uses the index.
http://dev.mysql.com/doc/refman/5.0/en/show-columns.html
If Key is MUL, multiple occurrences of a given value are permitted within the column. The column is the first column of a nonunique index or a unique-valued index that can contain NULL values.
So instead of scanning all the null values, the second query look exclusively for for non-null values which speeds things up.
....I think.