I'm having trouble optimizing what I think is a reasonable / straightforward query in MySQL. After spending a few late nights on this, I thought I'd post my question here as I'm sure the solution will be obvious to somebody.
Here are the columns in a simplified version of my table T:
id: varchar(32) not null (primary key)
timestamp: bigint(20) unsigned not null
family: char(32) not null
size: bigint(20) unsigned not null
And here's the query that I need to optimize:
select
id
from
T
where
family = 'some constant'
and
timestamp between T1 and T2
order by
size desc
limit
5
My table is fairly large (~630M rows) so I'm hoping that an index can do most of the work for me... but I'm having trouble picking the right columns for my index.
It seems that in order for MySQL to use an index to answer a range query (like what I'm doing w/ the timestamp), that column must be the last column in the index. But then there's the "order by", which is on a different column. I'm not sure which one of these columns should be last in my index, so I've tried creating the following indices:
i1, on (family, timestamp, size)
i2, on (family, size, timestamp)
... but neither of these seems to help.
Any idea what I'm doing wrong?
(BTW I'm running MySQL 8 in Amazon RDS, in case that makes a difference.)
Thanks in advance for any helpful suggestions you may have!
EDIT #1 ---------------------------------------
I just created this simplified table that I described above, and copied 10M rows worth of data from the original table to the simplified table, just to keep things clean. Then I ran the following query:
mysql> select
-> id, size
-> from
-> T
-> where
-> family = 'be0bf4a203797729f38c6355b6d80903'
-> and
-> timestamp between 1578460425887 and 1584710866343
-> order by
-> size desc;
... and took 1.27 seconds. I really need this to be faster, otherwise this sort of query (which I need to run many times per second) will take much too long on the real dataset.
Here are the results of an EXPLAIN on the query above:
+----+-------------+-------+------------+-------+---------------------------+---------------------------+---------+------+--------+----------+------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------------------+---------------------------+---------+------+--------+----------+------------------------------------------+
| 1 | SIMPLE | T | NULL | range | idx_family_timestamp_size | idx_family_timestamp_size | 136 | NULL | 178324 | 100.00 | Using where; Using index; Using filesort |
+----+-------------+-------+------------+-------+---------------------------+---------------------------+---------+------+--------+----------+------------------------------------------+
I bet it's the filesort that's killing performance. Any ideas?
EDIT #2 ---------------------------------------
Oops, I just realized that I forgot the LIMIT in EDIT #1's query. I've fixed that here, and also grew T to 100M rows -- so now it's 10x the size that it was in my previous edit.
Now my query takes almost 10 sec. to run, and the results of the EXPLAIN are as follows:
mysql> explain
-> select
-> id, size
-> from
-> T
-> where
-> family = 'be0bf4a203797729f38c6355b6d80903'
-> and
-> timestamp between 1578460425887 and 1584710866343
-> order by
-> size desc
-> limit
-> 5;
+----+-------------+-------+------------+-------+-----------------------------------------------------+---------------------------+---------+------+--------+----------+--------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+-----------------------------------------------------+---------------------------+---------+------+--------+----------+--------------------------------------------------------+
| 1 | SIMPLE | T | NULL | range | idx_family_timestamp_size,idx_family_size_timestamp | idx_family_size_timestamp | 144 | NULL | 410744 | 100.00 | Using where; Using index for skip scan; Using filesort |
+----+-------------+-------+------------+-------+-----------------------------------------------------+---------------------------+---------+------+--------+----------+--------------------------------------------------------+
Related
I have a table like this (details elided for readability):
CREATE TABLE UserData (
id bigint NOT NULL AUTO_INCREMENT,
userId bigint NOT NULL DEFAULT '0', ...
c6 int NOT NULL DEFAULT '0', ...
hidden int NOT NULL DEFAULT '0', ...
c22 int NOT NULL DEFAULT '0', ...
PRIMARY KEY (id), ...
KEY userId_hidden_c6_c22_idx (userId,hidden,c6,c22), ...
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3
and was happily doing queries on it like this in MySQL 5.7:
mysql> select * from UserData use index (userId_hidden_c6_c22_idx) where (userId = 123 AND hidden = 0) order by id DESC limit 10 offset 0;
+----
| ...
+----
10 rows in set (0.03 sec)
However, in MySQL 8.0 these queries started doing this:
mysql> select * from UserData use index (userId_hidden_c6_c22_idx) where (userId = 123 AND hidden = 0) order by id DESC limit 10 offset 0;
+----
| ...
+----
10 rows in set (1.56 sec)
Explain shows the following, 5.7:
mysql> explain select * from UserData use index (userId_hidden_c6_c22_idx) where (userId = 123 AND hidden = 0) order by id DESC limit 10 offset 0;
+----+-------------+----------+------------+------+---------------+---------------+---------+-------------+-------+----------+---------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+---------------+---------+-------------+-------+----------+---------------------------------------+
| 1 | SIMPLE | UserData | NULL | ref | userId_hidden_c6_c22_idx | userId_hidden_c6_c22_idx | 12 | const,const | 78062 | 100.00 | Using index condition; Using filesort |
+----+-------------+----------+------------+------+---------------+---------------+---------+-------------+-------+----------+---------------------------------------+
8.0:
mysql> explain select * from UserData use index (userId_hidden_c6_c22_idx) where (userId = 123 AND hidden = 0) order by id DESC limit 10 offset 0;
+----+-------------+----------+------------+------+---------------+---------------+---------+-------------+-------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+---------------+---------+-------------+-------+----------+----------------+
| 1 | SIMPLE | UserData | NULL | ref | userId_hidden_c6_c22_idx | userId_hidden_c6_c22_idx | 12 | const,const | 79298 | 100.00 | Using filesort |
+----+-------------+----------+------------+------+---------------+---------------+---------+-------------+-------+----------+----------------+
The main difference seems to be that 5.7 is Using index condition; Using filesort and 8.0 is only Using filesort.
Why did 8.0 stop using the index condition, and how can I get it to start using it?
EDIT: Why did performance drop 10-100x with MySQL 8.0? It looks like it's because it stopped using the Index Condition Pushdown Optimization - how can I get it to start using it?
The table has ~150M rows in it, and that user has ~75k records, so I guess it could be a change in the size-based heuristics or whatever goes into the MySQL decision making?
In the EXPLAIN you show, the type column is ref and the key column names the index, which indicates it is using that index to optimize the lookup.
You are making an incorrect interpretation of what "index condition" means in the extra column. Admittedly it does sound like "using the index" versus not using the index if that note is absent.
The note about "index condition" is referring to Index Condition Pushdown, which is not related to using the index, but it's about delegating other conditions to be filtered at the storage engine level. Read about it here: https://dev.mysql.com/doc/refman/8.0/en/index-condition-pushdown-optimization.html
It's unfortunate that the notes reported by EXPLAIN are so difficult to understand. You really have to study a lot of documentation to understand how to read those notes.
This would be much faster in either version because it would stop after 10 rows. That is, the "filesort" would be avoided.
INDEX(userId, hidden, id)
This won't do "Using index" (aka "covering"), but neither did your attempts. That is different from "Using index condition" (aka "ICP", as you point out).
Try these to get more insight:
EXPLAIN FORMAT_JSON SELECT ...
EXPLAIN ANALYZE SELECT ...
(No, I cannot explain the regression.)
Overview
I'm running MySQL 5.7.30-33, and I'm hitting an issue that seems like MySQL is using the wrong index when running a query. I'm getting a 3 second query time using my existing query. However, just by changing the ORDER BY, removing the LIMIT, or forcing a USE INDEX I can get a 0.01 second query time. Unfortunately I need to stick with my original query (it's baked into an application), so it'd be great if this disparity could be resolved in the schema/indexing.
Setup / problem
My table structure is as follows:
CREATE TABLE `referrals` (
`__id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`systemcreated` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`referrerid` mediumtext COLLATE utf8mb4_unicode_ci,
`referrersiteid` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
... lots more mediumtext fields ...
PRIMARY KEY (`__id`),
KEY `systemcreated` (`systemcreated`,`referrersiteid`,`__id`)
) ENGINE=InnoDB AUTO_INCREMENT=53368 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED
The table only has ~55k rows, but is very wide, as some of the fields contain huge BLOBs:
mysql> show table status like 'referrals'\G;
*************************** 1. row ***************************
Name: referrals
Engine: InnoDB
Version: 10
Row_format: Compressed
Rows: 45641
Avg_row_length: 767640
Data_length: 35035897856
Max_data_length: 0
Index_length: 3653632
Data_free: 3670016
Auto_increment: 54008
Create_time: 2020-12-12 12:46:14
Update_time: 2020-12-12 17:50:28
Check_time: NULL
Collation: utf8mb4_unicode_ci
Checksum: NULL
Create_options: row_format=COMPRESSED
Comment:
1 row in set (0.00 sec)
My customer's application queries the table using this, and unfortunately that can't easily be changed:
SELECT *
FROM referrals
WHERE `systemcreated` LIKE 'XXXXXX%'
AND `referrersiteid` LIKE 'XXXXXXXXXXXX%'
order by __id desc
limit 16;
This results in a query time around 3 seconds.
The EXPLAIN looks like this:
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | referrals | NULL | index | systemcreated | PRIMARY | 4 | NULL | 32 | 5.56 | Using where |
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
Note that it is using the PRIMARY key for the query rather than the systemcreated index.
Experimentation 1
If I change the query to use ASC rather than DESC:
SELECT *
FROM referrals
WHERE `systemcreated` LIKE 'XXXXXX%'
AND `referrersiteid` LIKE 'XXXXXXXXXXXX%'
order by __id asc
limit 16;
then it takes 0.01 seconds, and the EXPLAIN looks to be the same:
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | referrals | NULL | index | systemcreated | PRIMARY | 4 | NULL | 32 | 5.56 | Using where |
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
Experimentation 2
If I change the query to stick with ORDER BY __id DESC, but remove the LIMIT:
SELECT *
FROM referrals
WHERE `systemcreated` LIKE 'XXXXXX%'
AND `referrersiteid` LIKE 'XXXXXXXXXXXX%'
order by __id desc;
then it also takes 0.01 seconds, with an EXPLAIN like this:
+----+-------------+-------------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------+
| 1 | SIMPLE | referrals | NULL | range | systemcreated | systemcreated | 406 | NULL | 2086 | 11.11 | Using index condition; Using filesort |
+----+-------------+-------------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------+
Experimentation 3
Alternatively, if I force the original query to use the systemcreated index then it also gives a 0.01 sec query time. Here's the EXPLAIN:
mysql> explain SELECT *
FROM referrals USE INDEX (systemcreated)
WHERE `systemcreated` LIKE 'XXXXXX%'
AND `referrersiteid` LIKE 'XXXXXXXXXXXX%'
order by __id desc
limit 16;
+----+-------------+--------------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------+
| 1 | SIMPLE | referrals | NULL | range | systemcreated | systemcreated | 406 | NULL | 2086 | 11.11 | Using index condition; Using filesort |
+----+-------------+--------------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------+
Experimentation 4
Lastly, if I use the original ORDER BY __id DESC LIMIT 16 but select fewer fields, then it also returns in 0.01 seconds! Here's the explain:
mysql> explain SELECT field1, field2, field3, field4, field5
FROM referrals
WHERE `systemcreated` LIKE 'XXXXXX%'
AND `referrersiteid` LIKE 'XXXXXXXXXXXX%'
order by __id desc
limit 16;
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | referrals | NULL | index | systemcreated | PRIMARY | 4 | NULL | 32 | 5.56 | Using where |
+----+-------------+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
Summary
So the only combination that seems to be performing poorly is ORDER BY __id DESC LIMIT 16.
I think I have the indexes setup correctly. I'm querying via the systemcreated and referrersiteid fields, and ordering by __id, so I have an index defined as (systemcreated, referrersiteid, __id), but MySQL still seems to be using the PRIMARY key.
Any suggestions?
"Avg_row_length: 767640"; lots of MEDIUMTEXT. A row is limited to about 8KB; overflow goes into "off-record" blocks. Reading those blocks takes extra disk hits.
SELECT * will reach for all those fat columns. The total will be about 50 reads (of 16KB each). This takes time.
(Exp 4) SELECT a,b,c,d ran faster because it did not need to fetch all ~50 blocks per row.
Your secondary index, (systemcreated,referrersiteid,__id), -- only the first column is useful. This is because of systemcreated LIKE 'xxx%'. This is a "range". Once a range is hit, the rest of the index is ineffective. Except...
"Index hints" (USE INDEX(...)) may help today but may make things worse tomorrow when the data distribution changes.
If you can't get rid of the wild cards in LIKE, I recommend these two indexes:
INDEX(systemcreated)
INDEX(referrersiteid)
The real speedup can occur by turning the query inside out. That is, find the 16 ids first, then go looking for all those bulky columns:
SELECT r2... -- whatever you want
FROM
(
SELECT __id
FROM referrals
WHERE `systemcreated` LIKE 'XXXXXX%'
AND `referrersiteid` LIKE 'XXXXXXXXXXXX%'
order by __id desc
limit 16
) AS r1
JOIN referrals r2 USING(__id)
ORDER BY __id DESC -- yes, this needs repeating
And keep the 3-column secondary index that you have. Even if it must scan a lot more than 16 rows to find the 16 desired, it is a lot less bulky. This means that the subquery ("derived table") will be moderately fast. Then the outer query will still have 16 lookups -- possibly 16*50 blocks to read. The total number of blocks read will still be a lot less.
There is rarely a noticeable difference between ASC and DESC on ORDER BY.
Why does the Optimizer pick the PK instead of the seemingly better secondary index? The PK might be best, especially if the 16 rows are at the 'end' (DESC) of the table. But that would be a terrible choice if it had to scan the entire table without finding 16 rows.
Meanwhile, the wildcard test makes the secondary index only partially useful. The Optimizer makes a decision based on inadequate statistics. Sometimes it feels like the flip of a coin.
If you use my inside-out reformulation, then I recommend the follow two composite indexes -- The Optimizer can make a semi-intelligent, semi-correct choice between them for the derived table:
INDEX(systemcreated, referrersiteid, __id),
INDEX(referrersiteid, systemcreated, __id)
It will continue to say "filesort", but don't worry; it's only sorting 16 rows.
And, remember, SELECT * is hurting performance. (Though maybe you can't fix that.)
I was busying myself with exploring GROUP BY optimizations. On a classical "max salary per departament" query. And suddenly weird results. The dump below goes straight from my console. NO COMMAND were issued between these two EXPLAINS. Only some time had passed.
mysql> explain select name, t1.dep_id, salary
from emploee t1
JOIN ( select dep_id, max(salary) msal
from emploee
group by dep_id
) t2
ON t1.salary=t2.msal and t1.dep_id = t2.dep_id
order by salary desc;
+----+-------------+------------+-------+---------------+--------+---------+-------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+--------+---------+-------------------+------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 4 | Using temporary; Using filesort |
| 1 | PRIMARY | t1 | ref | dep_id | dep_id | 8 | t2.dep_id,t2.msal | 1 | |
| 2 | DERIVED | emploee | index | NULL | dep_id | 8 | NULL | 84 | Using index |
+----+-------------+------------+-------+---------------+--------+---------+-------------------+------+---------------------------------+
3 rows in set (0.00 sec)
mysql> explain select name, t1.dep_id, salary
from emploee t1
JOIN ( select dep_id, max(salary) msal
from emploee
group by dep_id
) t2
ON t1.salary=t2.msal and t1.dep_id = t2.dep_id
order by salary desc;
+----+-------------+------------+-------+---------------+--------+---------+-------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+--------+---------+-------------------+------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 4 | Using temporary; Using filesort |
| 1 | PRIMARY | t1 | ref | dep_id | dep_id | 8 | t2.dep_id,t2.msal | 3 | |
| 2 | DERIVED | emploee | range | NULL | dep_id | 4 | NULL | 9 | Using index for group-by |
+----+-------------+------------+-------+---------------+--------+---------+-------------------+------+---------------------------------+
3 rows in set (0.00 sec)
As you may notice, it examined ten times less rows in second run. I assume it's because some inner counters got changed. But I don't want to depend on these counters. So - is there a way to hint mysql to use "Using index for group by" behavior only?
Or - if my speculations are wrong - is there any other explanation on the behavior and how to fix it?
CREATE TABLE `emploee` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`dep_id` int(11) NOT NULL,
`salary` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `dep_id` (`dep_id`,`salary`)
) ENGINE=InnoDB AUTO_INCREMENT=85 DEFAULT CHARSET=latin1 |
+-----------+
| version() |
+-----------+
| 5.5.19 |
+-----------+
Hm, showing the cardinality of indexes may help, but keep in mind: range's are usually slower then indexes there.
Because it think it can match the full index in the first one, it uses the full one. In the second one, it drops the index and goes for a range, but guesses the total number of rows satisfying that larger range wildly lower then the smaller full index, because all cardinality has changed. Compare it to this: why would "AA" match 84 rows, but "A[any character]" match only 9 (note that it uses 8 bytes of the key in the first, 4 bytes in the second)? The second one will in reality not read less rows, EXPLAIN just guesses the number of rows differently after an update on it's metadata of indexes. Not also that EXPLAIN does not tell you what a query will do, but what it probably will do.
Updating the cardinality can or will occur when:
The cardinality (the number of different key values) in every index of a table is calculated when a table is opened, at SHOW TABLE STATUS and ANALYZE TABLE and on other circumstances (like when the table has changed too much). Note that all tables are opened, and the statistics are re-estimated, when the mysql client starts if the auto-rehash setting is set on (the default).
So, assume 'at any point' due to 'changed too much', and yes, connecting with the mysql client can alter the behavior in choosing indexes of a server. Also: reconnecting of the mysql client after it lost its connection after a timeout counts as connecting with auto-rehash AFAIK. If you want to give mysql help to find the proper method, run ANALYZE TABLE once in a while, especially after heavy updating. If you think the cardinality it guesses is often wrong, you can alter the number of pages it reads to guess some statistics, but keep in mind a higher number means a longer running update of that cardinality, and something you don't want to happen that often when 'data has changed to much' on a table with a lot of operations.
TL;DR: it guesses rows differently, but you'd actually prefer the first behavior if the data makes that possible.
Adding:
On this previously linked page, we can probably also find why especially dep_id might have this problem:
small values like 1 or 2 can result in very inaccurate estimates of cardinality
I could imagine the number of different dep_id's is typically quite small, and I've indeed observed a 'bouncing' cardinality on non-unique indexes with quite a small range compared to the number of rows in my own databases. It easily guesses a range of 1-10 in the hundreds and then down again the next time, just based on the specific sample pages it picks & some algorithm that tries to extrapolate that.
Table structure:
+-------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+----------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| total | int(11) | YES | | NULL | |
| thedatetime | datetime | YES | MUL | NULL | |
+-------------+----------+------+-----+---------+----------------+
Total rows: 137967
mysql> explain select * from out where thedatetime <= NOW();
+----+-------------+-------------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | out | ALL | thedatetime | NULL | NULL | NULL | 137967 | Using where |
+----+-------------+-------------+------+---------------+------+---------+------+--------+-------------+
The real query is much more longer with more table joins, the point is, I can't get the table to use the datetime index. This is going to be hard for me if I want to select all data until certain date. However, I noticed that I can get MySQL to use the index if I select a smaller subset of data.
mysql> explain select * from out where thedatetime <= '2008-01-01';
+----+-------------+-------------+-------+---------------+-------------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+---------------+-------------+---------+------+-------+-------------+
| 1 | SIMPLE | out | range | thedatetime | thedatetime | 9 | NULL | 15826 | Using where |
+----+-------------+-------------+-------+---------------+-------------+---------+------+-------+-------------+
mysql> select count(*) from out where thedatetime <= '2008-01-01';
+----------+
| count(*) |
+----------+
| 15990 |
+----------+
So, what can I do to make sure MySQL will use the index no matter what date that I put?
There are two things in play here -
Index is not selective enough - if the index covers more than approx. 30% of the rows, MySQL will decide a full table scan is more efficient. When you contract the range the index kicks in.
One index per table in a join
The real query is much more longer
with more table joins, the point is ...
The point is exactly because it has joins that it probably can't use that index. MySQL can use one index per table in a join (unless it qualifies for an index-merge optimization). If the primary key is already used for the join, thedatetime won't be used. In order to use it, you need to create a multi-column index on the join key + thedatetime index, in the correct order.
Check the EXPLAIN of the actual query to see which key MySQL uses for the join. Modify that index to include the thedatetime column as well, or create a new multi-column index from both (depending on what you use the join key for).
Everything works as it is supposed to. :)
Indexes are there to speed up retrieval. They do it using index lookups.
In you first query the index is not used because you are retrieving ALL rows, and in this case using index is slower (lookup index, get row, lookup index, get row... x number of rows is slower then get all rows == table scan)
In the second query you are retrieving only a portion of the data and in this case table scan is much slower.
The job of the optimizer is to use statistics that RDBMS keeps on the index to determine the best plan. In first case index was considered, but planner (correctly) threw it away.
EDIT
You might want to read something like this to get some concepts and keywords regarding mysql query planner.
Is there any tangible difference (speed/efficiency) between these statements? Assume the column is indexed.
SELECT MAX(someIntColumn) AS someIntColumn
or
SELECT someIntColumn ORDER BY someIntColumn DESC LIMIT 1
This depends largely on the query optimizer in your SQL implementation. At best, they will have the same performance. Typically, however, the first query is potentially much faster.
The first query essentially asks for the DBMS to inspect every value in someIntColumn and pick the largest one.
The second query asks the DBMS to sort all the values in someIntColumn from largest to smallest and pick the first one. Depending on the number of rows in the table and the existence (or lack thereof) of an index on the column, this could be significantly slower.
If the query optimizer is sophisticated enough to realize that the second query is equivalent to the first one, you are in luck. But if you retarget your app to another DBMS, you might get unexpectedly poor performance.
EDIT based on explain plan:
Explain plan shows that max(column) is more efficient. The explain plan say, “Select tables optimized away”.
EXPLAIN SELECT version from schema_migrations order by version desc limit 1;
+----+-------------+-------------------+-------+---------------+--------------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+-------+---------------+--------------------------+---------+------+------+-------------+
| 1 | SIMPLE | schema_migrations | index | NULL | unique_schema_migrations | 767 | NULL | 1 | Using index |
+----+-------------+-------------------+-------+---------------+--------------------------+---------+------+------+-------------+
1 row in set (0.00 sec)
EXPLAIN SELECT max(version) FROM schema_migrations ;
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
1 row in set (0.00 sec)