MySQL Index usage for ORDER BY LIMIT query - mysql

I'm using "users" table with over 2 millions records. The query is:
SELECT * FROM users WHERE 1 ORDER BY firstname LIMIT $start,30
"firstname" column is indexed. Getting first pages is very fast while getting last pages is very slow.
I used EXPLAIN and here are the results:
for
EXPLAIN SELECT * FROM `users` WHERE 1 ORDER BY `firstname` LIMIT 10000 , 30
I'm getting:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE users index NULL firstname 194 NULL 10030
But for
EXPLAIN SELECT * FROM `users` WHERE 1 ORDER BY `firstname` LIMIT 100000 , 30
I'm getting
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE users ALL NULL NULL NULL NULL 2292912 Using filesort
What's the issue?

You shouldn't use limit to page that far into your dataset.
You'll get much better results by using range queries.
SELECT * FROM users
WHERE firstname >= last_used_name
ORDER BY firstname
LIMIT 30
Where last_used_name is one that you already seen (I'm assuming that you do batch processing of some sort). You will get more accurate results if you do range queries on a column with unique index. This way you won't get the same record twice.
When you do
LIMIT 100000 , 30
MySQL essentially does the same as in
LIMIT 100030
Only it doesn't return first 100 thousands. But it sorts and reads them.

For * queries without WHERE condition MySQL often does not use any indexes.
The following query should be much faster and make use of the index:
SELECT *
FROM (
SELECT user_id
FROM users
WHERE 1
ORDER BY firstname
LIMIT $start, 30) ids
JOIN users
USING (user_id);
user_id is the primary key.

Related

Very slow query when using `id in (max(id))` in subquery

We recently moved our database from MariaDB to AWS Amazon Aurora RDS (MySQL). We observed something strange in a set of queries. We have two queries that are very quick, but when together as nested subquery it takes ages to finish.
Here id is the primary key of the table
SELECT * FROM users where id in(SELECT max(id) FROM users where id = 1);
execution time is ~350ms
SELECT * FROM users where id in(SELECT id FROM users where id = 1);
execution time is ~130ms
SELECT max(id) FROM users where id = 1;
execution time is ~130ms
SELECT id FROM users where id = 1;
execution time is ~130ms
We believe it has to do something with the type of value returned by max that is causing the indexing to be ignored when running the outer query from results of the sub query.
All the above queries are simplified for illustration of the problem. The original queries have more clauses as well as 100s of millions of rows. The issue did not exist prior to the migration and worked fine in MariaDB.
--- RESULTS FROM MariaDB ---
MySQL seems to optimize less efficient compared to MariaDB (int this case).
When doing this in MySQL (see: DBFIDDLE1), the execution plans look like:
For the query without MAX:
id select_type table partitions type
possible_keys
key key_len ref
rows
filtered Extra
1 SIMPLE integers null const
PRIMARY
PRIMARY 4 const
1
100.00 Using index
1 SIMPLE integers null const
PRIMARY
PRIMARY 4 const
1
100.00 Using index
For the query with MAX:
id select_type table partitions type
possible_keys
key key_len ref
rows
filtered Extra
1 PRIMARY integers null index null
PRIMARY
4 null
1000
100.00 Using where; Using index
2 DEPENDENT SUBQUERY null null null null
null
null null
null
null Select tables optimized away
While MariaDB (see: DBFIDDLE2 does have a better looking plan when using MAX:
id select_type table type
possible_keys
key key_len ref
rows
filtered Extra
1 PRIMARY system null
null
null null
1
100.00
1 PRIMARY integers const PRIMARY
PRIMARY
4 const
1
100.00 Using index
2 MATERIALIZED null null null
null
null null
null
null Select tables optimized away
EDIT: Because of time (some lack of it 😉) I now add some info
A suggestion to fix this:
SELECT *
FROM integers
WHERE i IN (select * from (SELECT MAX(i) FROM integers WHERE i=1)x);
When looking at the EXECUTION PLAN from MariaDB, which has 1 extra step, I tried to do the same in MySQL. Above query has an even bigger execution plan, but tests show that it performs better. (for explain plans, see: DBFIDDLE1a)
"the question is Mariadb that much faster? it uses a step more that mysql"
One step more does not mean that things get slower.
MySQL takes about 2-3 seconds on the query using the MAX, and MariaDB does execute the same in under 10 msecs. But this is performance, and time may vary on different systems.
SELECT max(id) FROM users where id = 1
Is strange. Since it is looking only at rows where id = 1, then "max" is obviously "1". So is the min. And the average.\
Perhaps you wanted:
SELECT max(id) FROM users
Is there an index on id? Perhaps the PRIMARY KEY? If not, then that might explain the sluggishness.
This can be done much faster (against assuming an index):
SELECT * FROM users
ORDER BY id DESC
LIMIT 1
Does that give you what you want?
To discuss this further, please provide SHOW CREATE TABLE users

How to reduce rows lookup when using LIMIT MySQL

I have the following table with Index on id and Foreign Key on activityID:
comment (id, activityID, text)
and the following query:
SELECT <cols> FROM `comment` WHERE `comment`.`activityID` = 1257 ORDER BY `id` DESC LIMIT 20;
I basically want to get only the first 20 comments for this activity that has 1165, however, this is the result of a describe:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE comment ref activityID activityID 4 const 1165 NULL
Essentially, it is looking through all comments for this activity before deciding to limit it.
We tested this query under high load when an activity has 200,000 comments and the query takes 5+ seconds, whereas on the same load, an activity with 30 comments takes a couple of ms.
PS: If I remove the WHERE clause, an EXPLAIN says it will only lookup a single row (don't know if that's the case really):
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE comment index NULL PRIMARY 4 NULL 1 NULL
Is it possible to optimize this kind of query in any way?
Thank you.
The ordering is causing the slowness.
The query uses the activityID index to find all the rows with that ID. Then it has to read all 200,000 comments and sort them by id to find the last 20.
Add a composite index so it can use an index for the ordering:
ALTER TABLE comment ADD INDEX (activityID, id);
Note that you will no longer need the index on activityID by itself, since it's a prefix of this new index.
Use offset
SELECT <cols> FROM `comment` WHERE `comment`.`activityID` = 1257 ORDER BY `id` DESC LIMIT 0,20;
In limit clause add 0 as an offset to get only the first 20 comments
Just add two separate indexes on activityID and id. That should help you in ORDER BY too. There is no hard and fast rules in optimizations, but you need to try various methods.
Do it this way:
ALTER TABLE comment ADD INDEX (id);
ALTER TABLE comment ADD INDEX (activityID);
I think this will help.

mysql index issue-explain says file sort

i have very simple mysql table with 5 columns but 5 million data. earlier when data was less my server load was very less but now the load is increasing as the data is more than 5 million and i expect it to reach 10 million by this year end so server will be more slow.i have used indexed wisely
structure is very simple with id as auto increment and primary key and i am filtering the data based on id only which is automatically indexed as it is primary key(i tried indexing it as well but no benefit)
table A
id pid title app get
my query is
EXPLAIN SELECT * FROM tableA ORDER BY id DESC LIMIT 4061280 , 10
and explain says
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE tableA ALL NULL NULL NULL NULL 4700461 Using filesort
i dont want to go through all rows as it will slow down my server and create heavy load for file sorting as it will create temporary files either in buffer or in disc.
please advice any good idea to solve this issue.
when my id is indexed why it will go through all rows and reach to desired row.it can not jump directly to that row????
Assuming you don't have "gaps" (read deleted records) in your id..
SELECT * FROM tableA WHERE id > 4061279 and id <= 4061290 ORDER BY id DESC
Ok next
SELECT * FROM tableA WHERE id <= 4061290 ORDER BY id DESC LIMIT 10

optimizing less than greater than query of mysql

i am facing heavy load from my one query . i have 10 million data in my table and i have done indexing of both pid and fid columns then also mysql explain shows that it is not using any index
here is my query
select * from tableA where pid < '94898' and fid='37' order by id desc limit 1;
my mysql explain output says this
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE tableA index fid PRIMARY 4 NULL 152 Using where
but mysql slow shows its scanning millions of data

Limited SQL query returns all rows instead of one

I tried the SQL code:
explain SELECT * FROM myTable LIMIT 1
As a result I got:
id select_type table type possible_keys key key_len ref **rows**
1 SIMPLE myTable ALL NULL NULL NULL NULL **32117**
Do you know why the query would run though all rows instead of simply picking the first row?
What can I change within the query (or in my table) to reduce the line amount for a similar result?
The rows count shown is only an estimate of the number of rows to examine. It is not always equal to the actual number of rows examined when you run the query.
In particular:
LIMIT is not taken into account while estimating number of rows Even if you have LIMIT which restricts how many rows will be examined MySQL will still print full number.
Source
When the query actually runs only one row will be examined.
Edited for use of subselect:
Assuming the primary key is "my_id" , use WHERE. For instance:
select * from mytable
where my_id = (
select max(my_id) from mytable
)
While this seems less efficient at first, the result is as such in explain, resulting in just one row returned and a read on the index to find max. I do not suggest doing this against partitioned tables in MySQL:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY mytable const PRIMARY PRIMARY 4 const 1
2 SUBQUERY NULL NULL NULL NULL NULL NULL NULL Select tables optimized away