MySQL Order by Optimization - mysql

Below is the structure of a table:-
Article: ID, Title, Desc, PublishedDateTime, ViewsCount, Published
Primary Key: ID
Query Used:
Select Title FROM Article ORDER By ViewsCount DESC, PublishedDateTime ASC
As you can see that I am mixing ASC and DESC & according to MySQL Order By optimization, indexes will not be used.
I have thought to use a composite index using the ViewsCount and PublishedDateTime. Do you recommend to use 2 different keys instead of using composite index. But then I have read that composite index is better than using 2 different keys (if both fields are going to be used).
Some more information shared:
The table contains more than 550K+ records and also I am having big trouble in adding and deleting indexes for test purpose. What do you guys recommend ? Should I test on a small sample ?
Below are some more insights:
Indexes Used:
1) ViewsCount
2) PublishedDateTime.
3) ViewsCount & PublishedDateTime (named as ViewsDate_Index )
A) EXPLAIN Query mixing ASC and DESC:
EXPLAIN SELECT title FROM `article` ORDER BY ViewsCount DESC , PublishedDateTime ASC LIMIT 0 , 20
====+===============+=========+======+===============+=====+=========+======+========+================+
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 | SIMPLE | article | ALL | NULL | NULL| NULL | NULL | 550116 | Using filesort
====+===============+=========+======+===============+=====+=========+======+========+================+
B) EXPLAIN Query using the same sorting order:
EXPLAIN SELECT title FROM `article` ORDER BY ViewsCount DESC , PublishedDateTime DESC LIMIT 0 , 20
====+===============+=========+=======+===============+=================+=========+=============+========+================+
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 | SIMPLE | article | index | NULL | ViewsDate_Index | 16 | NULL | 550116 |
====+===============+=========+=======+===============+=================+=========+=============+========+================+
You can see that if ViewsCount and PublishedDateTime are in 2 same sorting order then it uses the ViewsDate_Index index. One thing that I found strange was that possible_keys is NULL and still it selects an index. Can someone explain the reason for this.
Also any tips on adding indexes on this table because it takes alot of time to add a new index. Any workaround or help in this regarding will be appreciated.

First of all, run the whole query live, and see how it performs. When you've got some benchmarks down, plug the query into your MySQL console and prepend EXPLAIN to it. MySQL will not perform the query, but it will display it's plan to execute the query, including where it thinks is important to optimize, which indexes it will use, how many rows it has to traverse, and how efficiently it will traverse each set of rows, among other things as well. The best way to gauge a performance problem is through benchmarking. Use it often.

In practice, indexes won't be used even for ORDER By ViewsCount, PublishedDateTime here, since you SELECT all columns and apply no condition. Is it a real query? Because any conditions will spoil your optimizations.
If your table is so small that you are going to pull it as the whole, indexes will only slow down your query. (Relates to the original query: SELECT * FROM article ORDER BY ViewsCount DESC, PublishedDateTime;)
UPD
In case where you have 500K+ rows, I think you are going to use LIMIT clause. I would do the following:
add an index on (ViewCount, PublishedDateTime)
rewrite the query as follows:
SELECT Title
FROM (
SELECT id
FROM article
ORDER BY ViewsCount DESC, PublishedDateTime
LIMIT 100, 100
) ids
JOIN article
USING (id);
The ordering would benefit from operating on a subset of data from the covering index. The join will just obtain Titles by ids.
UPD2
Another query that might work much better when the cardinality of the ViewCount is rather small (though you should benchmark):
SELECT Title
FROM (
SELECT ViewCount
FROM article
GROUP BY ViewCount DESC) as groups
JOIN article USING (ViewCount)
LIMIT 0, 100;
It as well assumes you have (ViewCount, PublishedDateTime) index on the table.

Related

How does Index Scope work in Mysql?

In the MySQL manual there is a page on index hinting that mentions that you can specify the index hinting for specific parts of the query.
You can specify the scope of an index hint by adding a FOR clause to the hint. This provides more fine-grained control over the optimizer's selection of an execution plan for various phases of query processing. To affect only the indexes used when MySQL decides how to find rows in the table and how to process joins, use FOR JOIN. To influence index usage for sorting or grouping rows, use FOR ORDER BY or FOR GROUP BY.
However, there is little to no more information about how this works or what it actually does in the MySQL optimizer. As well in practice it appears to be negligible in actually improving anything.
Here is a test query and what explain says about the query:
SELECT
`property`.`primary_id` AS `id`
FROM `California` `property`
USE INDEX FOR JOIN (`Zipcode Bedrooms`)
USE INDEX FOR ORDER BY (`Zipcode Bathrooms`)
INNER JOIN `application_zipcodes` `az`
ON `az`.`application_id` = '18'
AND `az`.`zipcode` = `property`.`zipcode`
WHERE `property`.`city` = 'San Jose'
AND `property.`zipcode` = '95133'
AND `property`.property_type` = 'Residential'
AND `property`.`style` = 'Condominium'
AND `property`.`bedrooms` = '3'
ORDER BY `property`.`bathrooms` ASC
LIMIT 15
;
Explain:
EXPLAIN SELECT `property`.`primary_id` AS `id` FROM `California` `property` USE INDEX FOR JOIN (`Zipcode Bedrooms`) USE INDEX FOR ORDER BY (`Zipcode Bathrooms`) INNER JOIN `application_zipcodes` `az` ON `az`.`application_id` = '18' AND `az`.`zipcode` = `property`.`zipcode` WHERE `property`.`city` = 'San Jose' AND `property.`zipcode` = '95133' AND `property`.property_type` = 'Residential' AND `property`.`style` = 'Condominium' AND `property`.`bedrooms` = '3' ORDER BY `property`.`bathrooms` ASC LIMIT 15\g
+------+-------------+----------+--------+---------------+---------+---------+------------------------------------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------+--------+---------------+---------+---------+------------------------------------+------+----------------------------------------------------+
| 1 | SIMPLE | Property | ref | Zip Bed | Zip Bed | 17 | const,const | 2364 | Using index condition; Using where; Using filesort |
| 1 | SIMPLE | az | eq_ref | PRIMARY | PRIMARY | 7 | const,Property.zipcode | 1 | Using where; Using index |
+------+-------------+----------+--------+---------------+---------+---------+------------------------------------+------+----------------------------------------------------+
2 rows in set (0.01 sec)
So to summarize I am basically wondering how the index scope was meant to be used, as this doesn't seem to do anything when I add or remove the line USE INDEX FOR ORDER BY (Zipcode Bathrooms).
I have yet to figure out how multiple hints can be used. MySQL will almost never use more than one index per SELECT. The only exception I know of is with "index merge", which is not relevant in your example.
The Optimizer usually focuses on finding a good index for the WHERE clause. If it entirely covers the WHERE, without any "ranges", then it checks to see if there are GROUP BY and ORDER BY fields, in the right order, to use. If it can handle all of WHERE, GROUP BY, and ORDER BY, then it can actually optimize the LIMIT (but not OFFSET).
If the Optimizer can't consume all the WHERE, it may reach into the ORDER BY in hopes avoiding the "filesort" that ORDER BY otherwise requires.
None of this allows for different indexes for different clauses. A single hint may encourage the use of one of the above cases (above) in preference to the other; I don't know.
Don't use utf8 for zipcode; it makes things bulkier than necessary (3 bytes per character). In general, shrinking the size of the table will help performance some. Or, if you have a huge dataset, it may help perf a lot. (Avoiding I/O is very important.)
Bathrooms is not very selective; there is not much to gain even if it would be possible.
az.application_id is the big monkey wrench in the query; what is it?

retrieving top-ranking rows from large tables using FULLTEXT is very slow

When we log into our database with mysql-client and launch these queries:
first test query:
select a.*
from ads a
inner join searchs_titles s on s.id_ad = a.id
where match(s.label) against ('"bmw serie 3"' in boolean mode)
order by a.ranking asc limit 0, 10;
The result is:
10 rows in set (1 min 5.37 sec)
second test query:
select a.*
from ads a
inner join searchs_titles s on s.id_ad = a.id
where match(s.label) against ('"ford mondeo"' in boolean mode)
order by a.ranking asc limit 0, 10;
The result is:
10 rows in set (2 min 13.88 sec)
These queries are too slow. Is there a way to improve this?
The 'ads' table contains 2 millions rows, triggers are set to duplicate the data into search title. Search titles contains the id, title and label of each row in ads.
Table 'ads' is powered by innoDB and 'searchs_titles' by myISAM with a fulltext index on the label field.
Do we have too many columns? Too many indexes? Too many rows?
Is it a bad query?
Thanks a lot for the time you will spend helping us!
Edit: add explain
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | s | fulltext | id_ad,label | label | 0 | | 1 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | a | eq_ref | PRIMARY,id,id_2,id_3 | PRIMARY | 4 | XXXXXX.s.id_ad | 1 | |
Pro tip: Never use * in a SELECT statement in production software (unless you have a very good reason). By asking for all columns, you are denying the optimizer access to information about how best to exploit your indexes.
Observation: you're ordering by ads.ranking and taking ten results. But ads.ranking has very low cardinality -- according to that image in your question, it has 26 distinct values. Is your query working correctly?
Observation: You've said that the fulltext part of your search takes .77 seconds. I mean this part:
select s.id
from searchs_titles AS s
where match(s.label) against ('"ford mondeo"' in boolean mode)
That is good. It means we can focus on the rest of the query.
You also said you've been testing with the insertions to the table turned off. That's good because it rules out contention as a cause for the slow queries.
Suggestion: Create a suitable compound index for ads. For your present query, try an index on (id, ranking) This may allow your ORDER BY operation to avoid a full table scan.
Then, try this query to extract the set of ten a.id values you need, and then retrieve the data rows. This will exploit your compound index.
select z.*
from ads AS z
join ( select a.id, a.ranking
from ads AS a
inner join searchs_titles s on s.id_ad = a.id
where match(s.label) against ('"ford mondeo"' in boolean mode)
order by a.ranking asc
limit 0, 10
) AS b ON z.id = b.id
order by z.ranking
This uses a subquery to do the order by ... limit ... datashuffling operation on a small subset of the columns. This should make the retrieval of the appropriate id values much faster. Then the outer query fetches the appropriate rows.
The bottom line is this: ORDER BY ... LIMIT ... can be a very expensive operation if it's done on lots of data. But if you can arrange for it to be done on a minimal choice of columns, and those columns are indexed correctly, it can be very fast.

Why is MySQL slow when using LIMIT in my query?

I'm trying to figure out why is one of my query slow and how I can fix it but I'm a bit puzzled on my results.
I have an orders table with around 80 columns and 775179 rows and I'm doing the following request :
SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL ORDER BY creation_date DESC LIMIT 200
which returns 38 rows in 4.5s
When removing the ORDER BY I'm getting a nice improvement :
SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL LIMIT 200
38 rows in 0.30s
But when removing the LIMIT without touching the ORDER BY I'm getting an even better result :
SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL ORDER BY creation_date DESC
38 rows in 0.10s (??)
Why is my LIMIT so hungry ?
GOING FURTHER
I was trying a few things before sending my answer and after noticing that I had an index on creation_date (which is a datetime) I removed it and the first query now runs in 0.10s. Why is that ?
EDIT
Good guess, I have indexes on the others columns part of the where.
mysql> explain SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL ORDER BY creation_date DESC LIMIT 200;
+----+-------------+--------+-------+------------------------+---------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+------------------------+---------------+---------+------+------+-------------+
| 1 | SIMPLE | orders | index | id_state_idx,id_mp_idx | creation_date | 5 | NULL | 1719 | Using where |
+----+-------------+--------+-------+------------------------+---------------+---------+------+------+-------------+
1 row in set (0.00 sec)
mysql> explain SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL ORDER BY creation_date DESC;
+----+-------------+--------+-------+------------------------+-----------+---------+------+-------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+------------------------+-----------+---------+------+-------+----------------------------------------------------+
| 1 | SIMPLE | orders | range | id_state_idx,id_mp_idx | id_mp_idx | 3 | NULL | 87502 | Using index condition; Using where; Using filesort |
+----+-------------+--------+-------+------------------------+-----------+---------+------+-------+----------------------------------------------------+
Indexes do not necessarily improve performance. To better understand what is happening, it would help if you included the explain for the different queries.
My best guess would be that you have an index in id_state or even id_state, id_mp that can be used to satisfy the where clause. If so, the first query without the order by would use this index. It should be pretty fast. Even without an index, this requires a sequential scan of the pages in the orders table, which can still be pretty fast.
Then when you add the index on creation_date, MySQL decides to use that index instead for the order by. This requires reading each row in the index, then fetching the corresponding data page to check the where conditions and return the columns (if there is a match). This reading is highly inefficient, because it is not in "page" order but rather as specified by the index. Random reads can be quite inefficient.
Worse, even though you have a limit, you still have to read the entire table because the entire result set is needed. Although you have saved a sort on 38 records, you have created a massively inefficient query.
By the way, this situation gets significantly worse if the orders table does not fit in available memory. Then you have a condition called "thrashing", where each new record tends to generate a new I/O read. So, if a page has 100 records on it, the page might have to be read 100 times.
You can make all these queries run faster by having an index on orders(id_state, id_mp, creation_date). The where clause will use the first two columns and the order by will use the last.
Same problem happened in my project,
I did some test, and found out that LIMIT is slow because of row lookups
See:
MySQL ORDER BY / LIMIT performance: late row lookups
So, the solution is:
(A)when using LIMIT, select not all columns, but only the PK columns
(B)Select all columns you need, and then join with the result set of (A)
SQL should likes:
SELECT
*
FROM
orders O1 <=== this is what you want
JOIN
(
SELECT
ID <== fetch the PK column only, this should be fast
FROM
orders
WHERE
[your query condition] <== filter record by condition
ORDER BY
[your order by condition] <== control the record order
LIMIT 2000, 50 <== filter record by paging condition
) as O2
ON
O1.ID = O2.ID
ORDER BY
[your order by condition] <== control the record order
in my DB,
the old SQL which select all columns using "LIMIT 21560, 20", costs about 4.484s.
the new sql costs only 0.063s. The new one is about 71 times faster
I had a similar issue on a table of 2.5 million records. Removing the limit part the query took a few seconds. With the limit part it stuck forever.
I solved with a subquery. In your case it would became:
SELECT *
FROM
(SELECT *
FROM orders
WHERE id_state = 2
AND id_mp IS NOT NULL
ORDER BY creation_date DESC) tmp
LIMIT 200
I noted that the original query was fast when the number of selected rows was greater than the limit parameter. Se the query became extremely slow when the limit parameter was useless.
Another solution is trying forcing index. In your case you can try with
SELECT *
FROM orders force index (id_mp_idx)
WHERE id_state = 2
AND id_mp IS NOT NULL
ORDER BY creation_date DESC
LIMIT 200
Problem is that mysql is forced to sort data on the fly. My query of deep offset like:
ORDER BY somecol LIMIT 99990, 10
Took 2.5s.
I fixed it by creating a new table, which has presorted data by column somecol and contains only ids, and there the deep offset (without need to use ORDER BY) takes 0.09s.
0.1s is not still enough fast though. 0.01s would be better.
I will end up creating a table that holds the page number as special indexed column, so instead of doing limit x, y i will query where page = Z.
i just tried it and it is fast as 0.0013. only problem is, that the offseting is based on static numbers (presorted in pages by 10 items for example.. its not that big problem though.. you can still get out any data of any pages.)

Why my mysql answer that "not using key" when I use rand in where

I have a table that has 4,000,000 records.
The table is created that : (user_id int, partner_id int, PRIMARY_KEY ( user_id )) engine=InnoDB;
I want to test the performance of select 100 records.
Then, I tested following:
mysql> explain select user_id from MY_TABLE use index (PRIMARY) where user_id IN ( 1 );
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------------+
| 1 | PRIMARY | MY_TABLE | const | PRIMARY | PRIMARY | 4 | const | 1 | Using index |
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------------+
1 row in set, 1 warning (0.00 sec)
This is OK.
But, this query is buffered by mysql.
So, this test make no after the first test.
Then, I thinked of a sql that select by random value.
I tested following:
mysql> explain select user_id from MY_TABLE use index (PRIMARY) where user_id IN ( select ceil( rand() ) );
+----+-------------+----------+-------+---------------+---------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+------+---------+--------------------------+
| 1 | PRIMARY | MY_TABLE | index | NULL | PRIMARY | 4 | NULL | 3998727 | Using where; Using index |
+----+-------------+----------+-------+---------------+---------+---------+------+---------+--------------------------+
But, it's bad.
Explain shows that possible_keys is NULL.
So, full index scanning is planned, and in fact, it's too slow rather than the one before.
Then, I want to ask you to teach me how do I write random value with index looking up.
Thanks
Using rand() in SQL is usually a sure-fire way to make the query slow. A common theme here is people using it in ORDER BY to get a random sequence. It's slow because not only does it throw away the indexes, but it also reads through the whole table.
However in your case, the fact that the function calls are in a sub-query ought to allow the outer query to still use its indexes. The fact that it isn't seems quite odd (so I've given the question a +1 vote).
My theory is that perhaps MySQL's optimiser is getting it wrong -- it's seeing the functions in the inner query, and deciding incorrectly that it can't use an index.
The only thing I can suggest to work around that is using force index to push MySQL into using the index you want.
See the definition of rand().
If i understand right, you are trying to get a random record from the database. If that is the case, again from the rand() definition:
ORDER BY RAND() combined with LIMIT is useful for selecting a random sample from a set of rows:
SELECT * FROM table1, table2 WHERE a=b AND c<d -> ORDER BY RAND() LIMIT 1000;
It's a limitation of the MySQL optimizer, that it can't tell that the subquery returns exactly one value, it has to assume the subquery returns multiple rows with unpredictable values, potentially even all the values of user_id. Therefore it decides it's just going to do an index scan.
Here's a workaround:
mysql> explain select user_id from MY_TABLE use index (PRIMARY)
where user_id = ( select ceil( rand() ) );
Note that MySQL's RAND() function returns a value in the range 0 <= v < 1.0. If you CEIL() it, you'll likely get the value 1. Therefore you'll virtually always get the row where user_id=1. If you don't have such a row in your table, you'll get an empty set result. You certainly won't get a user chosen randomly among all your users.
To fix that problem, you'd have to multiply the rand() by the number of distinct user_id values. And that brings up the problem that you might have gaps, so a randomly chosen value won't match any existing user_id.
Re your comment:
You'll always see possible keys as NULL when you get an index scan (i.e., "type" is "index").
I tried your explain query on a similar table, and it appears that the optimizer can't figure out that the subquery is a constant expression. You can workaround this limitation by calculating the random number in application code and then using the result as a constant value in your query:
select user_id from MY_TABLE use index (PRIMARY)
where user_id = $random;

How does MySQL's ORDER BY RAND() work?

I've been doing some research and testing on how to do fast random selection in MySQL. In the process I've faced some unexpected results and now I am not fully sure I know how ORDER BY RAND() really works.
I always thought that when you do ORDER BY RAND() on the table, MySQL adds a new column to the table which is filled with random values, then it sorts data by that column and then e.g. you take the above value which got there randomly. I've done lots of googling and testing and finally found that the query Jay offers in his blog is indeed the fastest solution:
SELECT * FROM Table T JOIN (SELECT CEIL(MAX(ID)*RAND()) AS ID FROM Table) AS x ON T.ID >= x.ID LIMIT 1;
While common ORDER BY RAND() takes 30-40 seconds on my test table, his query does the work in 0.1 seconds. He explains how this functions in the blog so I'll just skip this and finally move to the odd thing.
My table is a common table with a PRIMARY KEY id and other non-indexed stuff like username, age, etc. Here's the thing I am struggling to explain
SELECT * FROM table ORDER BY RAND() LIMIT 1; /*30-40 seconds*/
SELECT id FROM table ORDER BY RAND() LIMIT 1; /*0.25 seconds*/
SELECT id, username FROM table ORDER BY RAND() LIMIT 1; /*90 seconds*/
I was sort of expecting to see approximately the same time for all three queries since I am always sorting on a single column. But for some reason this didn't happen. Please let me know if you any ideas about this. I have a project where I need to do fast ORDER BY RAND() and personally I would prefer to use
SELECT id FROM table ORDER BY RAND() LIMIT 1;
SELECT * FROM table WHERE id=ID_FROM_PREVIOUS_QUERY LIMIT 1;
which, yes, is slower than Jay's method, however it is smaller and easier to understand. My queries are rather big ones with several JOINs and with WHERE clause and while Jay's method still works, the query grows really big and complex because I need to use all the JOINs and WHERE in the JOINed (called x in his query) sub request.
Thanks for your time!
While there's no such thing as a "fast order by rand()", there is a workaround for your specific task.
For getting any single random row, you can do like this german blogger does: http://web.archive.org/web/20200211210404/http://www.roberthartung.de/mysql-order-by-rand-a-case-study-of-alternatives/ (I couldn't see a hotlink url. If anyone sees one, feel free to edit the link.)
The text is in german, but the SQL code is a bit down the page and in big white boxes, so it's not hard to see.
Basically what he does is make a procedure that does the job of getting a valid row. That generates a random number between 0 and max_id, try fetching a row, and if it doesn't exist, keep going until you hit one that does. He allows for fetching x number of random rows by storing them in a temp table, so you can probably rewrite the procedure to be a bit faster fetching only one row.
The downside of this is that if you delete A LOT of rows, and there are huge gaps, the chances are big that it will miss tons of times, making it ineffective.
Update: Different execution times
SELECT * FROM table ORDER BY RAND() LIMIT 1; /30-40 seconds/
SELECT id FROM table ORDER BY RAND() LIMIT 1; /0.25 seconds/
SELECT id, username FROM table ORDER BY RAND() LIMIT 1; /90 seconds/
I was sort of expecting to see approximately the same time for all three queries since I am always sorting on a single column. But for some reason this didn't happen. Please let me know if you any ideas about this.
It may have to do with indexing. id is indexed and quick to access, whereas adding username to the result, means it needs to read that from each row and put it in the memory table. With the * it also has to read everything into memory, but it doesn't need to jump around the data file, meaning there's no time lost seeking.
This makes a difference only if there are variable length columns (varchar/text), which means it has to check the length, then skip that length, as opposed to just skipping a set length (or 0) between each row.
It may have to do with indexing. id is
indexed and quick to access, whereas
adding username to the result, means
it needs to read that from each row
and put it in the memory table. With
the * it also has to read everything
into memory, but it doesn't need to
jump around the data file, meaning
there's no time lost seeking. This
makes a difference only if there are
variable length columns, which means
it has to check the length, then skip
that length, as opposed to just
skipping a set length (or 0) between
each row
Practice is better that all theories! Why not just to check plans? :)
mysql> explain select name from avatar order by RAND() limit 1;
+----+-------------+--------+-------+---------------+-----------------+---------+------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+-----------------+---------+------+-------+----------------------------------------------+
| 1 | SIMPLE | avatar | index | NULL | IDX_AVATAR_NAME | 302 | NULL | 30062 | Using index; Using temporary; Using filesort |
+----+-------------+--------+-------+---------------+-----------------+---------+------+-------+----------------------------------------------+
1 row in set (0.00 sec)
mysql> explain select * from avatar order by RAND() limit 1;
+----+-------------+--------+------+---------------+------+---------+------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+-------+---------------------------------+
| 1 | SIMPLE | avatar | ALL | NULL | NULL | NULL | NULL | 30062 | Using temporary; Using filesort |
+----+-------------+--------+------+---------------+------+---------+------+-------+---------------------------------+
1 row in set (0.00 sec)
mysql> explain select name, experience from avatar order by RAND() limit 1;
+----+-------------+--------+------+--------------+------+---------+------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+-------+---------------------------------+
| 1 | SIMPLE | avatar | ALL | NULL | NULL | NULL | NULL | 30064 | Using temporary; Using filesort |
+----+-------------+--------+------+---------------+------+---------+------+-------+---------------------------------+
I can tell you why the SELECT id FROM ... is much slower than the other two, but I am not sure, why SELECT id, username is 2-3 times faster than SELECT *.
When you have an index (the primary key in your case) and the result includes only the columns from the index, MySQL optimizer is able to use the data from the index only, does not even look into the table itself. The more expensive is each row, the more effect you will observe, since you substitute the filesystem IO operations with pure in-memory operations. If you will have an additional index on (id, username), you will have a similar performance in the third case as well.
Why don't you add an index id, username on the table see if that forces mysql to use the index rather than just a filesort and temp table.
PrimaryKey's are indexed. So those are "found" faster.
If you want a random (entire row) but the speed of using the PrimaryKey with the Random function..you can try this (below code):
You use the derived-table to "find" the primary-key of a single random row. Then you join on it..to get the entire-row.
Select * from my_thing mainTable
JOIN
(
Select my_thing_key from my_thing order by RAND() LIMIT 1
) derived1
on mainTable.my_thing_key = derived1.my_thing_key;
Using RAND() is slower. And * is slower, too.
What I cannot explain is why id, username is slower than *.
This is a strange phenomenon that I cannot replicate.
The fastest method would be to get MAX(id) and store it in memory. Then, using your software pull a random number with it as the ceiling and then in SQL
SELECT id, username FROM table WHERE id > ? LIMIT 1;
and if no row, fall back to
SELECT id, username FROM table LIMIT 1;
If your MySQL installation is not buggy, you should do
SELECT id, username FROM table ORDER BY RAND() LIMIT 1;
with a small-medium data set. Doing two selects cannot be faster. But software is buggy.