Why does my ORDER BY BIGINT(20) take so long? - mysql

I'm trying to improve my query so that it doesn't take so long. Is there anything I can try?
I'm using InnoDB.
My table:
mysql> describe hunted_place_review_external_urls;
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| worker_id | varchar(255) | YES | MUL | NULL | |
| queued_at | bigint(20) | YES | MUL | NULL | |
| external_url | varchar(255) | NO | | NULL | |
| place_id | varchar(63) | NO | MUL | NULL | |
| source_id | varchar(63) | NO | | NULL | |
| successful | tinyint(1) | NO | | 0 | |
+--------------+--------------+------+-----+---------+----------------+
My query:
mysql> select * from hunted_place_review_external_urls where worker_id is null order by queued_at asc limit 1;
1 row in set (4.00 sec)
mysql> select count(*) from hunted_place_review_external_urls where worker_id is null;
+----------+
| count(*) |
+----------+
| 19121 |
+----------+
1 row in set (0.00 sec)
Why is it taking 4s even though I have an index on queued_at and worker_id?
Here's the EXPLAIN of this query:
mysql> explain select * from hunted_place_review_external_urls where worker_id is null order by queued_at asc limit 1;
+----+-------------+-----------------------------------+-------+---------------+-----------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------------------------+-------+---------------+-----------+---------+------+------+-------------+
| 1 | SIMPLE | hunted_place_review_external_urls | index | worker_id | queued_at | 9 | NULL | 67 | Using where |
+----+-------------+-----------------------------------+-------+---------------+-----------+---------+------+------+-------------+
1 row in set (0.00 sec)
It becomes much faster when I remove the order by queued_at part:
mysql> select * from hunted_place_review_external_urls where worker_id is null limit 1;
1 row in set (0.00 sec)
It also becomes much faster when the count(*) is smaller:
mysql> select count(*) from hunted_place_review_external_urls where worker_id is null;
+----------+
| count(*) |
+----------+
| 10 |
+----------+
1 row in set (0.00 sec)
mysql> select * from hunted_place_review_external_urls where worker_id is null order by queued_at asc limit 1;
1 row in set (0.00 sec)
My queued_at values are timestamps expressed in number of milliseconds, such as 1398210069531

MySQL is using the queued_at index to avoid a "Using filesort" operation. It appears that MySQL is looking at every single row in the table, and that's taking four seconds.
MySQL is using the index to get the row with the lowest value of queued_at first, then visiting the underlying data page to check whether worker_id is NULL or not. MySQL works through the index, from the lowest value of queued_at up through the highest value.
For every matching row found, MySQL adds that row to the resultset.
Note that the LIMIT clause doesn't get applied until after all the matching rows are found and the result set is prepared. (There's no "early out" when the first matching row is found, MySQL still chugs through every one of the rows to find every last one of them. But at least, MySQL is avoiding what could be an expensive Using filesort operation to get the rows ordered.)
Your other queries exhibit better performance because they have different access plans, which likely use indexes to limit the number of rows that need to be checked.
To improve performance of this particular query, you could try adding an index:
... ON hunted_place_review_external_urls (worker_id, queued_at);
If that's not an option, you could attempt to influence the optimizer to use a different index, with an index hint:
select *
from hunted_place_review_external_urls USING INDEX `worker_id`
where worker_id is null
order by queued_at asc
limit 1;
Note that the USING INDEX hint references the name of the index, not the name of the column. From the EXPLAIN output, it appears there is an index named "worker_id". I'm going to guess that this index is on the column named "worker_id", but that's just a guess.
As an aside, this doesn't have anything to do with the queued_at column being defined as a BIGINT vs an INT or SMALLINT or VARCHAR.

From the docs:
In some cases, MySQL cannot use indexes to resolve the ORDER BY,
although it still uses indexes to find the rows that match the WHERE
clause. These cases include the following:
...snip...
The key used to fetch the rows is not the same as the one used in the
ORDER BY:
SELECT * FROM t1 WHERE key2=constant ORDER BY key1;
And:
With EXPLAIN SELECT ... ORDER BY, you can check whether MySQL can use
indexes to resolve the query. It cannot if you see Using filesort in
the Extra column.
Your query plan confirms that your slow query is using the queued_at key. If you remove the ORDER BY, the query plan should use the worker_id key instead. One possible reason for the difference in speed is the difference in which key is being used.
As Peter Zaitsev says in MySQL Performance Blog: ORDER BY ... LIMIT Performance Optimization:
It is very important to have ORDER BY with LIMIT executed without scanning and sorting full result set, so it is important for it to use index...
For example if I do SELECT * FROM sites ORDER BY date_created DESC LIMIT 10; I would use index on (date_created) to get result set very fast.
Now what if I have something like SELECT * FROM sites WHERE category_id=5 ORDER BY date_created DESC LIMIT 10;
In this case index by date_created may also work but it might not be the most efficient – If it is rare category large portion of table may be scanned to find 10 rows. So index on (category_id, date_created) will be better idea.
You could try, per this suggestion, creating a composite index (worker_id, queued_at) for use with this specific query. If for some reason you can't add another index, you could also try forcing your ordered query to use the worker_id index, to narrow the result set before sorting.
It would be great if you could rewrite this query so that you could find the single row you want without the ORDER BY, since MySQL will order the result before applying LIMIT 1. But not knowing more about your broad goals here, I can't say whether that would be possible. What about splitting the task into the following two queries?
select MIN(queued_at) from hunted_place_review_external_urls where worker_id is null into #var;
select * from hunted_place_review_external_urls where worker_id is null and queued_at = #var;
Or as a subquery, if you don't have issues with duplicate values?
select * from hunted_place_review_external_urls where queued_at in (select MIN(queued_at) from hunted_place_review_external_urls where worker_id is null);

Related

Why different primary key queries have huge speed difference in innodb?

I have a simple table Test:
id, primary key;
id2, index;
and other 50+ all kinds of type columns;
And I know that if I select id from Test, it'll use secondary index id2 rather that primary index (clustered index) as stated in this post.
If I force queries using primary index, why do the results time differ a lot when selecting different columns?
Query 1
select id, url from Test order by id limit 1000000, 1, uses only 500ms+ and here is the explain:
MySQL [x]> explain select id, url from Test order by id limit 1000000, 1;
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+---------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+---------+----------+-------+
| 1 | SIMPLE | Test | NULL | index | NULL | PRIMARY | 8 | NULL | 1000001 | 100.00 | NULL |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+---------+----------+-------+
1 row in set, 1 warning (0.00 sec)
Query 2
select * from Test order by id limit 1000000, 1 uses only 2000ms+, and here is the explain:
MySQL [x]> explain select * from Test order by ID limit 1000000, 1;
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+---------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+---------+----------+-------+
| 1 | SIMPLE | Test | NULL | index | NULL | PRIMARY | 8 | NULL | 1000001 | 100.00 | NULL |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+---------+----------+-------+
1 row in set, 1 warning (0.00 sec)
I don't see any difference between both explains. So why is there such a huge difference regarding result time, since they use the same clustered index?
For the following query:
select id, url from t order by id limit 1000000, 1
MySQL seems to read 1,000,000 rows ordered by id instead of skipping them.
I would suggest changing the query to this:
select * from t where id = (select id from t order by id limit 1000000, 1)
MySQL seems to do a better job at skipping 1,000,000 rows when limit is placed inside a sub query.
Ok, I found the reason finally... It's because the implementation of mysql limit. (sorry that I just found this Chinese explanation, no English version)
In Query1 and Query2 above, here is what limit do:
Mysql query the clustered index, get the first row;
Mysql will convert the first row to result;
then before sending it to the client, Mysql finds that there is a limit 1000000, so the first row is not the right answer...
Mysql then just go to the 2nd row and convert it to result;
then before sending it to the client, Mysql finds that there is a limit 1000000, so the second row is not the right answer...;
again and again, till it findss the 1000001th row, after converting it to result, it matches the limit 1000000, 1 clase;
so finally, this is the right answer, and send it to the client;
However, it has converted totally 1000000 rows. So in the above question, it's the cost between 'all fields conversion(select *) multiply 1000000 rows' vs. 'one/two field conversion(select id/url) multiply 1000000 rows'. No doubt that the former is far slower than the latter.
Don't know why mysql limit behaives so clumsy, but it just is...
check sql profile,Determine more information
mysql> show profile
2.mysql explain is not very powerful yet.
3.What kind of scene needs limit 10000?

MySQL query is slow only when using ORDER BY field DESC and LIMIT

Overview
I'm running MySQL 5.7.30-33, and I'm hitting an issue that seems like MySQL is using the wrong index when running a query. I'm getting a 3 second query time using my existing query. However, just by changing the ORDER BY, removing the LIMIT, or forcing a USE INDEX I can get a 0.01 second query time. Unfortunately I need to stick with my original query (it's baked into an application), so it'd be great if this disparity could be resolved in the schema/indexing.
Setup / problem
My table structure is as follows:
CREATE TABLE `referrals` (
`__id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`systemcreated` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`referrerid` mediumtext COLLATE utf8mb4_unicode_ci,
`referrersiteid` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
... lots more mediumtext fields ...
PRIMARY KEY (`__id`),
KEY `systemcreated` (`systemcreated`,`referrersiteid`,`__id`)
) ENGINE=InnoDB AUTO_INCREMENT=53368 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED
The table only has ~55k rows, but is very wide, as some of the fields contain huge BLOBs:
mysql> show table status like 'referrals'\G;
*************************** 1. row ***************************
Name: referrals
Engine: InnoDB
Version: 10
Row_format: Compressed
Rows: 45641
Avg_row_length: 767640
Data_length: 35035897856
Max_data_length: 0
Index_length: 3653632
Data_free: 3670016
Auto_increment: 54008
Create_time: 2020-12-12 12:46:14
Update_time: 2020-12-12 17:50:28
Check_time: NULL
Collation: utf8mb4_unicode_ci
Checksum: NULL
Create_options: row_format=COMPRESSED
Comment:
1 row in set (0.00 sec)
My customer's application queries the table using this, and unfortunately that can't easily be changed:
SELECT *
FROM referrals
WHERE `systemcreated` LIKE 'XXXXXX%'
AND `referrersiteid` LIKE 'XXXXXXXXXXXX%'
order by __id desc
limit 16;
This results in a query time around 3 seconds.
The EXPLAIN looks like this:
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | referrals | NULL | index | systemcreated | PRIMARY | 4 | NULL | 32 | 5.56 | Using where |
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
Note that it is using the PRIMARY key for the query rather than the systemcreated index.
Experimentation 1
If I change the query to use ASC rather than DESC:
SELECT *
FROM referrals
WHERE `systemcreated` LIKE 'XXXXXX%'
AND `referrersiteid` LIKE 'XXXXXXXXXXXX%'
order by __id asc
limit 16;
then it takes 0.01 seconds, and the EXPLAIN looks to be the same:
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | referrals | NULL | index | systemcreated | PRIMARY | 4 | NULL | 32 | 5.56 | Using where |
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
Experimentation 2
If I change the query to stick with ORDER BY __id DESC, but remove the LIMIT:
SELECT *
FROM referrals
WHERE `systemcreated` LIKE 'XXXXXX%'
AND `referrersiteid` LIKE 'XXXXXXXXXXXX%'
order by __id desc;
then it also takes 0.01 seconds, with an EXPLAIN like this:
+----+-------------+-------------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------+
| 1 | SIMPLE | referrals | NULL | range | systemcreated | systemcreated | 406 | NULL | 2086 | 11.11 | Using index condition; Using filesort |
+----+-------------+-------------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------+
Experimentation 3
Alternatively, if I force the original query to use the systemcreated index then it also gives a 0.01 sec query time. Here's the EXPLAIN:
mysql> explain SELECT *
FROM referrals USE INDEX (systemcreated)
WHERE `systemcreated` LIKE 'XXXXXX%'
AND `referrersiteid` LIKE 'XXXXXXXXXXXX%'
order by __id desc
limit 16;
+----+-------------+--------------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------+
| 1 | SIMPLE | referrals | NULL | range | systemcreated | systemcreated | 406 | NULL | 2086 | 11.11 | Using index condition; Using filesort |
+----+-------------+--------------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------+
Experimentation 4
Lastly, if I use the original ORDER BY __id DESC LIMIT 16 but select fewer fields, then it also returns in 0.01 seconds! Here's the explain:
mysql> explain SELECT field1, field2, field3, field4, field5
FROM referrals
WHERE `systemcreated` LIKE 'XXXXXX%'
AND `referrersiteid` LIKE 'XXXXXXXXXXXX%'
order by __id desc
limit 16;
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | referrals | NULL | index | systemcreated | PRIMARY | 4 | NULL | 32 | 5.56 | Using where |
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
Summary
So the only combination that seems to be performing poorly is ORDER BY __id DESC LIMIT 16.
I think I have the indexes setup correctly. I'm querying via the systemcreated and referrersiteid fields, and ordering by __id, so I have an index defined as (systemcreated, referrersiteid, __id), but MySQL still seems to be using the PRIMARY key.
Any suggestions?
"Avg_row_length: 767640"; lots of MEDIUMTEXT. A row is limited to about 8KB; overflow goes into "off-record" blocks. Reading those blocks takes extra disk hits.
SELECT * will reach for all those fat columns. The total will be about 50 reads (of 16KB each). This takes time.
(Exp 4) SELECT a,b,c,d ran faster because it did not need to fetch all ~50 blocks per row.
Your secondary index, (systemcreated,referrersiteid,__id), -- only the first column is useful. This is because of systemcreated LIKE 'xxx%'. This is a "range". Once a range is hit, the rest of the index is ineffective. Except...
"Index hints" (USE INDEX(...)) may help today but may make things worse tomorrow when the data distribution changes.
If you can't get rid of the wild cards in LIKE, I recommend these two indexes:
INDEX(systemcreated)
INDEX(referrersiteid)
The real speedup can occur by turning the query inside out. That is, find the 16 ids first, then go looking for all those bulky columns:
SELECT r2... -- whatever you want
FROM
(
SELECT __id
FROM referrals
WHERE `systemcreated` LIKE 'XXXXXX%'
AND `referrersiteid` LIKE 'XXXXXXXXXXXX%'
order by __id desc
limit 16
) AS r1
JOIN referrals r2 USING(__id)
ORDER BY __id DESC -- yes, this needs repeating
And keep the 3-column secondary index that you have. Even if it must scan a lot more than 16 rows to find the 16 desired, it is a lot less bulky. This means that the subquery ("derived table") will be moderately fast. Then the outer query will still have 16 lookups -- possibly 16*50 blocks to read. The total number of blocks read will still be a lot less.
There is rarely a noticeable difference between ASC and DESC on ORDER BY.
Why does the Optimizer pick the PK instead of the seemingly better secondary index? The PK might be best, especially if the 16 rows are at the 'end' (DESC) of the table. But that would be a terrible choice if it had to scan the entire table without finding 16 rows.
Meanwhile, the wildcard test makes the secondary index only partially useful. The Optimizer makes a decision based on inadequate statistics. Sometimes it feels like the flip of a coin.
If you use my inside-out reformulation, then I recommend the follow two composite indexes -- The Optimizer can make a semi-intelligent, semi-correct choice between them for the derived table:
INDEX(systemcreated, referrersiteid, __id),
INDEX(referrersiteid, systemcreated, __id)
It will continue to say "filesort", but don't worry; it's only sorting 16 rows.
And, remember, SELECT * is hurting performance. (Though maybe you can't fix that.)

Get random posts without scanning the whole database [duplicate]

This question already has answers here:
Fetching RAND() rows without ORDER BY RAND() in just one query
(3 answers)
Closed 9 years ago.
How can I get random posts without scanning the whole database.
As I know if you use MySQL ORDER BY RAND() it will scan the whole database.
If there is any other way to do this without scanning the whole database.
A tiny modification of #squeamish ossifrage solution using primary key values - assumming that there is a primary key in a table with numeric values:
SELECT *
FROM delete_me
WHERE id >= Round( Rand() *
( SELECT Max( id ) FROM test ))
LIMIT 1
For table containing more than 50.000 rows the query runs in a 100 miliseconds:
mysql> SELECT id, table_schema, table_name
FROM delete_me
WHERE id >= Round( Rand() *
( SELECT Max( id ) FROM delete_me ))
LIMIT 1;
+-----+--------------------+------------+
| id | table_schema | table_name |
+-----+--------------------+------------+
| 173 | information_schema | PLUGINS |
+-----+--------------------+------------+
1 row in set (0.01 sec)
A lot of people seem to be convinced that ORDER BY RAND() is somehow able to produce results without scanning the whole table.
Well it isn't. In fact, it's liable to be slower than ordering by column values, because MySQL has to call the RAND() function for each row.
To demonstrate, I made a simple table of half a million MD5 hashes:
mysql> select count(*) from delete_me;
+----------+
| count(*) |
+----------+
| 500000 |
+----------+
1 row in set (0.00 sec)
mysql> explain delete_me;
+-------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| txt | text | NO | | NULL | |
+-------+------------------+------+-----+---------+----------------+
2 rows in set (0.12 sec)
mysql> select * from delete_me limit 4;
+----+----------------------------------+
| id | txt |
+----+----------------------------------+
| 1 | 9b912c03d87991b71955a6cd4f81a299 |
| 2 | f1b7ddeb1c1a14265a620b8f2366a22e |
| 3 | 067b39538b767e2382e557386cba37d9 |
| 4 | 1a27619c1d2bb8fa583813fdd948e94c |
+----+----------------------------------+
4 rows in set (0.00 sec)
Using ORDER BY RAND() to choose a random row from this table takes my computer 1.95 seconds.
mysql> select * from delete_me order by rand() limit 1;
+--------+----------------------------------+
| id | txt |
+--------+----------------------------------+
| 446149 | b5f82dd78a171abe6f7bcd024bf662e8 |
+--------+----------------------------------+
1 row in set (1.95 sec)
But ordering the text fields in ascending order takes just 0.8 seconds.
mysql> select * from delete_me order by txt asc limit 1;
+-------+----------------------------------+
| id | txt |
+-------+----------------------------------+
| 88583 | 00001e65c830f5b662ae710f11ae369f |
+-------+----------------------------------+
1 row in set (0.80 sec)
Since the id values in this table are numbered sequentially starting from 1, I can choose a random row much more quickly like this:
mysql> select * from delete_me where id=floor(1+rand()*500000) limit 1;
+-------+----------------------------------+
| id | txt |
+-------+----------------------------------+
| 37600 | 3b8aaaf88af68ca0c6eccff7e61e897a |
+-------+----------------------------------+
1 row in set (0.02 sec)
But in the general case, I would suggest using the method proposed by Mike in the page linked to by #deceze.
My suggestion for this kind of requirement is to use an MD5 hash.
Add a field to your DB table, CHAR(32), and create and index for it.
Populate it for every record with an MD5 hash of anything (maybe the value from the ID column or just any old random number, doesn't matter too much as long as each record is different)
Now you can query the table like so:
SELECT * FROM myTable WHERE md5Col > MD5(NOW()) LIMIT 1
This will give you a single random record without having to scan the whole table. The table has a random sort order thanks to the MD5 values. MD5 is great for this because it's quick and randomly distributed.
Caveats:
If the MD5 from your SELECT query results in a hash that is larger than the last record in your table, you might get no records from the query. If that happens, you can always re-query it with a new hash.
Having a fixed MD5 hash on each record means that the records are in a fixed order. This isn't really an issue if you're only ever fetching a single record at a time, but if you're using it to fetch groups of records, it may be noticable. You can of course correct this if you want by rehashing records as you load them.

slow count(*) on innoDB

I have a table message_message with 3000000 records.
when I make a count(*) query, It's very slow...:
mysql> select count(*) from message_message;
+----------+
| count(*) |
+----------+
| 2819416 |
+----------+
1 row in set (2 min 35.35 sec)
explain it:
mysql> explain select count(*) from message_message;
| id | select_type| table | type | possible_keys | key | key_len | ref | rows |Extra |
| 1 | SIMPLE | message_message | index | NULL | PRIMARY | 4 | NULL | 2939870 | Using index |
1 row in set (0.02 sec)
what happen?
Have a look at This Post in InnoDB you need to do a full table scan, where as in MyISAM its a index read.
If you use a where clause though it changes the execution pattern to use indexes, so in general InnoDB will be slower than MyISAM on full unrestricted counts, where as the performance matches up on restricted counts.
If you want to count the amount of records, it's better to query the whole table and use the num_rows property of the result set. Count(...) is usually used when you want to have aggregate queries (in combination with GROUP BY).

Why does MySQL not use an index when executing this query?

mysql> desc users;
+-------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| email | varchar(128) | NO | UNI | | |
| password | varchar(32) | NO | | | |
| screen_name | varchar(64) | YES | UNI | NULL | |
| reputation | int(10) unsigned | NO | | 0 | |
| imtype | varchar(1) | YES | MUL | 0 | |
| last_check | datetime | YES | MUL | NULL | |
| robotno | int(10) unsigned | YES | | NULL | |
+-------------+------------------+------+-----+---------+----------------+
8 rows in set (0.00 sec)
mysql> create index i_users_imtype_robotno on users(imtype,robotno);
Query OK, 24 rows affected (0.25 sec)
Records: 24 Duplicates: 0 Warnings: 0
mysql> explain select * from users where imtype!='0' and robotno is null;
+----+-------------+-------+------+------------------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+------------------------+------+---------+------+------+-------------+
| 1 | SIMPLE | users | ALL | i_users_imtype_robotno | NULL | NULL | NULL | 24 | Using where |
+----+-------------+-------+------+------------------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)
But this way,it's used:
mysql> explain select * from users where imtype in ('1','2') and robotno is null;
+----+-------------+-------+-------+------------------------+------------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+------------------------+------------------------+---------+------+------+-------------+
| 1 | SIMPLE | users | range | i_users_imtype_robotno | i_users_imtype_robotno | 11 | NULL | 3 | Using where |
+----+-------------+-------+-------+------------------------+------------------------+---------+------+------+-------------+
1 row in set (0.01 sec)
Besides,this one also did not use index:
mysql> explain select id,email,imtype from users where robotno=1;
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | users | ALL | NULL | NULL | NULL | NULL | 24 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)
SELECT *
FROM users
WHERE imtype != '0' and robotno is null
This condition is not satisified by a single contiguous range of (imtype, robotno).
If you have records like this:
imtype robotno
$ NULL
$ 1
0 NULL
0 1
1 NULL
1 1
2 NULL
2 1
, ordered by (imtype, robotno), then the records 1, 5 and 7 would be returned, while other records wouldn't.
You'll need create this index to satisfy the condition:
CREATE INDEX ix_users_ri ON users (robotno, imptype)
and rewrite your query a little:
SELECT *
FROM users
WHERE (
robotno IS NULL
AND imtype < '0'
)
OR
(
robotno IS NULL
AND imtype > '0'
)
, which will result in two contiguous blocks:
robotno imtype
--- first block start
NULL $
--- first block end
NULL 0
--- second block start
NULL 1
NULL 2
--- second block end
1 $
1 0
1 1
1 2
This index will also serve this query:
SELECT id, email, imtype
FROM users
WHERE robotno = 1
, which is not served now by any index for the same reason.
Actually, the index for this query:
SELECT *
FROM users
WHERE imtype in ('1', '2')
AND robotno is null
is used only for coarse filtering on imtype (note using where in the extra field), it doesn't range robotno's
You need an index that has robotno as the first column. Your existing index is (imtype,robotno). Since imtype is not in the where clause, it can't use that index.
An index on (robotno,imtype) could be used for queries with just robotno in the where clause, and also for queries with both imtype and robotno in the where clause (but not imtype by itself).
Check out the docs on how MySQL uses indexes, and look for the parts that talk about multi-column indexes and "leftmost prefix".
BTW, if you think you know better than the optimizer, which is often the case, you can force MySQL to use a specific index by appending
FORCE INDEX (index_name) after FROM users.
It's because 'robotno' is potentially a primary key, and it uses that instead of the index.
A database systems query planner determines whether to do an index scan or not by analyzing the selectivity of the query's where clause relative to the index. (Indexes are also used to join tables together, but you only have users here.)
The first query has where imtype != '0'. This would select nearly all of the rows in users, assuming you have a large number of distinct values of imtype. The inequality operator is inherently unselective. So the MySQL query planner is betting here that reading through the index won't help and that it may as well just do a sequential scan through the whole table, since it probably would have to do that anyway.
On the other hand, had you said where imtype ='0', equality is a highly selective operator, and MySQL would bet that by reading just a few index blocks it could avoid reading nearly all of the blocks of the users table itself. So it would pick the index.
In your second example, where imtype in ('1','2'), MySQL knows that the index will be highly selective (though only half as selective as where imtype = '0'), and it will again bet that using the index will lead to a big payoff, as you discovered.
In your third example, where robotno=1, MySQL probably can't effectively use the index on users(imtype,robotno) since it would need to read in all the index blocks to find the robotno=1 record numbers: the index is sorted by imtype first, then robotno. If you had another index on users(robotno), MySQL would eagerly use it though.
As a footnote, if you had two indexes, one on users(imtype), and the other on users(imtype,robotno), and your query was on where imtype = '0', either index would make your query fast, but MySQL would probably select users(imtype) simply because it's more compact and fewer blocks would need to be read from it.
I'm being very simplistic here. Early database systems would just look at imtype's datatype and make a very rough guess at the selectivity of your query, but people very quickly realized that giving the query planner interesting facts like the total size of the table, the number of ditinct values in each column, etc. would enable it to make much smarter decisions. For instance if you had a users table where imtype was only every '0' or '1', the query planner might choose the index, since in that case the where imtype != '0' is more selective.
Take a look at the MySQL UPDATE STATISTICS statement and you'll see that its query planner must be sophisticated. For that reason I'd hesitate a great deal before using the FORCE statement to dictate a query plan to it. Instead, use UPDATE STATISTICS to give the query planner improved information to base its decisions on.
Your index is over users(imtype,robotno). In order to use this index, either imtype or imtype and robotno must be used to qualify the rows. You are just using robotno in your query, thus it can't use this index.