I wish to fetch the last 10 rows from the table of 1 M rows.
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`updated_date` datetime NOT NULL,
PRIMARY KEY (`id`)
)
One way of doing this is -
select * from test order by -id limit 10;
**10 rows in set (0.14 sec)**
Another way of doing this is -
select * from test order by id desc limit 10;
**10 rows in set (0.00 sec)**
So I did an 'EXPLAIN' on these queries -
Here is the result for the query where I use 'order by desc'
EXPLAIN select * from test order by id desc limit 10;
And here is the result for the query where I use 'order by -id'
EXPLAIN select * from test order by -id limit 10;
I thought this would be same but is seems there are differences in the execution plan.
RDBMS use heuristics to calculate the execution plan, they cannot always determine the semantic equivalence of two statements as it is a too difficult problem (in terms of theoretical and practical complexity).
So MySQL is not able to use the index, as you do not have an index on "-id", that is a custom function applied to the field "id". Seems trivial, but the RDBMSs must minimize the amount of time needed to compute the plans, so they get stuck with simple problems.
When an optimization cannot be found for a query (i.e. using an index) the system fall back to the implementation that works in any case: a scan of the full table.
As you can see in Explain results,
1 : order by id
MySQL is using indexing on id. So it need to iterate only 10 rows as it is already indexed. And also in this case MySQL don't need to use filesort algorithm as it is already indexed.
2 : order by -id
MySQL is not using indexing on id. So it needs to iterate all the rows.( e.g. 455952) to get your expected results. In this case MySQL needs to use filesort algorithm as id is not indexed. So it will obviously take more time :)
You use ORDER BY with an expression that includes terms other than the key column name:
SELECT * FROM t1 ORDER BY ABS(key);
SELECT * FROM t1 ORDER BY -key;
You index only a prefix of a column named in the ORDER BY clause. In this case, the index cannot be used to fully resolve the sort order. For example, if you have a CHAR(20) column, but index only the first 10 bytes, the index cannot distinguish values past the 10th byte and a filesort will be needed.
The type of table index used does not store rows in order. For example, this is true for a HASH index in a MEMORY table.
Please follow this link: http://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html
Related
I'm having trouble understanding my options for how to optimize this specific query. Looking online, I find various resources, but all for queries that don't match my particular one. From what I could gather, it's very hard to optimize a query when you have an order by combined with a limit.
My usecase is that i would like to have a paginated datatable that displayed the latest records first.
The query in question is the following (to fetch 10 latest records):
select
`xyz`.*
from
xyz
where
`xyz`.`fk_campaign_id` = 95870
and `xyz`.`voided` = 0
order by
`registration_id` desc
limit 10 offset 0
& table DDL:
CREATE TABLE `xyz` (
`registration_id` int NOT NULL AUTO_INCREMENT,
`fk_campaign_id` int DEFAULT NULL,
`fk_customer_id` int DEFAULT NULL,
... other fields ...
`voided` tinyint unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`registration_id`),
.... ~12 other indexes ...
KEY `activityOverview` (`fk_campaign_id`,`voided`,`registration_id` DESC)
) ENGINE=InnoDB AUTO_INCREMENT=280614594 DEFAULT CHARSET=utf8 COLLATE=utf8_danish_ci;
The explain on the query mentioned gives me the following:
"id","select_type","table","partitions","type","possible_keys","key","key_len","ref","rows","filtered","Extra"
1,SIMPLE,db_campaign_registration,,index,"getTop5,winners,findByPage,foreignKeyExistingCheck,limitReachedIp,byCampaign,emailExistingCheck,getAll,getAllDated,activityOverview",PRIMARY,"4",,1626,0.65,Using where; Backward index scan
As you can see it says it only hits 1626 rows. But, when i execute it - then it takes 200+ seconds to run.
I'm doing this to fetch data for a datatable that is to display the latest 10 records. I also have pagination that allows one to navigate pages (only able to go to next page, not last or make any big jumps).
To further help with getting the full picture I've put together a dbfiddle. https://dbfiddle.uk/Jc_K68rj - this fiddle does not have the same results as my table. But i suspect this is because of the data size that I'm having with my table.
The table in question has 120GB data and 39.000.000 active records. I already have an index put in that should cover the query and allow it to fetch the data fast. Am i completely missing something here?
Another solution goes something like this:
SELECT b.*
FROM ( SELECT registration_id
FROM xyz
where `xyz`.`fk_campaign_id` = 95870
and `xyz`.`voided` = 0
order by `registration_id` desc
limit 10 offset 0 ) AS a
JOIN xyz AS b USING (registration_id)
order by `registration_id` desc;
Explanation:
The derived table (subquery) will use the 'best' query without any extra prompting -- since it is "covering".
That will deliver 10 ids
Then 10 JOINs to the table to get xyz.*
A derived table is unordered, so the ORDER BY does need repeating.
That's tricking the Optimizer into doing what it should have done anyway.
(Again, I encourage getting rid of any indexes that are prefixes of the the 3-column, optimal, index discussed.)
KEY `activityOverview` (`fk_campaign_id`,`voided`,`registration_id` DESC)
is optimal. (Nearly as good is the same index, but without the DESC).
Let's see the other indexes. I strongly suspect that there is at least one index that is a prefix of that index. Remove it/them. The Optimizer sometimes gets confused and picks the "smaller" index instead of the "better index.
Here's a technique for seeing whether it manages to read only 10 rows instead of most of the table: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#handler_counts
I am using mysql to query a DB. What I would like to do is to query NOT the full database, but only the last 1000 rows ordered by timestamp.
I have tried to use this query which doesn't work ad I would like. I know the limit is used to return a fixed number of selected elements. But that is not what I would like. I want to query a fixed number of element.
select * from mutable where name = 'myschema' ORDER BY start_time DESC LIMIT 1000;
Any help?
tl;dr Sort Less Data
I guess your mutable table has an autoincrementing primary key called mutable.mutable_id.
You can then do this:
SELECT mutable_id
FROM mutable
WHERE name = 'myschema'
ORDER BY start_time DESC
LIMIT 1000;
It gives you a result set of the ids of all the relevant rows. The ORDER BY ... LIMIT work then only has to sort mutable_id and start_time values, not the whole table. So it takes less space and time in the MySql server.
Then you use that query to retrieve the details:
SELECT *
FROM mutable
WHERE mutable_id IN (
SELECT mutable_id
FROM mutable
WHERE name = 'myschema'
ORDER BY start_time DESC
LIMIT 1000
)
ORDER BY start_time DESC;
This will fetch all the data you need without needing to scan and sort the whole table.
If you create an index on name and start_time the subquery will be faster: the query can random-access the index to the appropriate name, then scan the start_time entries one by one until it finds 1000. No need to sort; the index is presorted.
CREATE INDEX x_mutable_start_time ON mutable (name, start_time);
If you're on MySQL 8 you can create a descending index and it's even faster.
CREATE INDEX x_mutable_start_time ON mutable (name, start_time DESC);
This works only with auto_increment
The trick is to sort less data, like O. Jones mentioned. Problem is in telling MySQL how to do so.
MySQL can't know what "last 1000 records" are unless it sorts them based on the query. That's exactly what you want to avoid so you need to tell MySQL how to find "last 1000 records".
This trick consists of telling MySQL at which auto_increment to start looking for the data. The problem is that you're using timestamps so I'm not sure whether this fits your particular use case.
Here's the query:
SELECT * FROM mutable
WHERE `name` = 'myschema'
AND id > (SELECT MAX(id) - 1000 FROM mutable WHERE `name` = 'myschema')
ORDER BY start_time DESC LIMIT 1000;
Problems:
auto_increments have gaps. These numbers aren't sequential, they're unique, calculated via sequential increment algorithm. To get better results, increase the subtraction number. You might get 1000 results, but you might get 500 results - depending on your dataset
if you don't have an auto_increment, this is useless
if the timestamps inserted are required to be sorted beforehand from larger to lower, this is useless
advantages:
- primary key is used to define the value range (where id > x), therefore dataset reduction will be the fastest possible.
I have to sort the query result on an alias column (total_reports) which is in group by condition with limit of having 50 number of records.
Please let me know where I am missing,
SELECT Count(world_name) AS total_reports,
name,
Max(last_update) AS report
FROM `world`
WHERE ( `id` = ''
AND `status` = 1 )
AND `time` >= '2017-07-16'
AND `name` LIKE '%%'
GROUP BY `name`
HAVING `total_reports` >= 2
ORDER BY `total_reports` DESC
LIMIT 50 offset 0
Query return what I need. However it runs on all records of table then return result and takes too many time which is not right way. I have thousands of records so its take time. I want to apply key index on alias which is total_reports in my situation.
Create an index on an column from an aggregated result? No, I'm sorry, but MySQL cannot do that natively.
What you need is probably a Materialized View that you could index. Not supported in MySQL (yet), unless you install extra plugins. See How to Create a Materialized View in MySQL.
The Long Answer
You cannot create an index on a column resulting from a GROUP BY statement. That column does not exist on the table, and cannot be derived at the row level (not a virtual column).
You query may be slow since it's probably reading the whole table. To only read the specific range of rows, add the index:
create index ix1 on `world` (`status`, `id`, `time`);
That should make the query use the filtering condition in a much better way and hopefully will speed up your query, by using and Index Range Scan.
Also, please change '%%' for '%'. Double % doesn't make too much sense. Actually, you should remove this condition altogether -- it's not filtering anything.
Finally, if the query is still slow, please post the execution plan, using:
explain <my_query_here>
I have a MySQL table:
CREATE TABLE mytable (
id INT NOT NULL AUTO_INCREMENT,
other_id INT NOT NULL,
expiration_datetime DATETIME,
score INT,
PRIMARY KEY (id)
)
I need to run query in the form of:
SELECT * FROM mytable
WHERE other_id=1 AND expiration_datetime > NOW()
ORDER BY score LIMIT 10
If I add this index to mytable:
CREATE INDEX order_by_index
ON mytable ( other_id, expiration_datetime, score);
Would MySQL be able to use the entire order_by_index in the query above?
It seems like it should be able to, but then according to MySQL's documentation: "The index can also be used even if the ORDER BY does not match the index exactly, as long as all of the unused portions of the index and all the extra ORDER BY columns are constants in the WHERE clause."
The above passage seems to suggest that index would only be used in a constant query while mine is a range query.
Can anyone clarify if index would be used in this case? If not, any way I could force the use of index?
Thanks.
MySQL will use the index to satisfy the where clause, and will use a filesort to order the results.
It can't use the index for the order by because you are not comparing expiration_datetime to a constant. Therefore, the rows being returned will not always all have a common prefix in the index, so the index can't be used for the sort.
For example, consider a sample set of 4 index records for your table:
a) [1,'2010-11-03 12:00',1]
b) [1,'2010-11-03 12:00',3]
c) [1,'2010-11-03 13:00',2]
d) [2,'2010-11-03 12:00',1]
If I run your query at 2010-11-03 11:00, it will return rows a,c,d which are not consecutive in the index. Thus MySQL needs to do the extra pass to sort the results and can't use an index in this case.
Can anyone clarify if index would be used in this case? If not, any way I could force the use of index?
You have a range in filtering condition and the ORDER BY not matching the range.
These conditions cannot be served with a single index.
To choose which index to create, you need to run these queries
SELECT COUNT(*)
FROM mytable
WHERE other_id = 1
AND (score, id) <
(
SELECT score, id
FROM mytable
WHERE other_id = 1
AND expiration_datetime > NOW()
ORDER BY
score, id
LIMIT 10
)
and
SELECT COUNT(*)
FROM mytable
WHERE other_id = 1
AND expiration_datetime >= NOW()
and compare their outputs.
If the second query yields about same or more values as the first one, then you should use an index on (other_id, score) (and let it filter on expiration_datetime).
If the second query yields significantly less values than the first one, you should use an index on (other_id, expiration_datetime) (and let it sort on score).
This article might be interesting to you:
Choosing index
Sounds like you've already checked the documentation and setup the index. Use EXPLAIN and see...
EXPLAIN SELECT * FROM mytable
WHERE other_id=1 AND expiration_datetime > NOW()
ORDER BY score LIMIT 10
I'm getting performance problems when LIMITing a mysql SELECT with a large offset:
SELECT * FROM table LIMIT m, n;
If the offset m is, say, larger than 1,000,000, the operation is very slow.
I do have to use limit m, n; I can't use something like id > 1,000,000 limit n.
How can I optimize this statement for better performance?
Perhaps you could create an indexing table which provides a sequential key relating to the key in your target table. Then you can join this indexing table to your target table and use a where clause to more efficiently get the rows you want.
#create table to store sequences
CREATE TABLE seq (
seq_no int not null auto_increment,
id int not null,
primary key(seq_no),
unique(id)
);
#create the sequence
TRUNCATE seq;
INSERT INTO seq (id) SELECT id FROM mytable ORDER BY id;
#now get 1000 rows from offset 1000000
SELECT mytable.*
FROM mytable
INNER JOIN seq USING(id)
WHERE seq.seq_no BETWEEN 1000000 AND 1000999;
If records are large, the slowness may be coming from loading the data. If the id column is indexed, then just selecting it will be much faster. You can then do a second query with an IN clause for the appropriate ids (or could formulate a WHERE clause using the min and max ids from the first query.)
slow:
SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000
fast:
SELECT id FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000
SELECT * FROM table WHERE id IN (1,2,3...10)
There's a blog post somewhere on the internet on how you should best make the selection of the rows to show should be as compact as possible, thus: just the ids; and producing the complete results should in turn fetch all the data you want for only the rows you selected.
Thus, the SQL might be something like (untested, I'm not sure it actually will do any good):
select A.* from table A
inner join (select id from table order by whatever limit m, n) B
on A.id = B.id
order by A.whatever
If your SQL engine is too primitive to allow this kind of SQL statements, or it doesn't improve anything, against hope, it might be worthwhile to break this single statement into multiple statements and capture the ids into a data structure.
Update: I found the blog post I was talking about: it was Jeff Atwood's "All Abstractions Are Failed Abstractions" on Coding Horror.
I don't think there's any need to create a separate index if your table already has one. If so, then you can order by this primary key and then use values of the key to step through:
SELECT * FROM myBigTable WHERE id > :OFFSET ORDER BY id ASC;
Another optimisation would be not to use SELECT * but just the ID so that it can simply read the index and doesn't have to then locate all the data (reduce IO overhead). If you need some of the other columns then perhaps you could add these to the index so that they are read with the primary key (which will most likely be held in memory and therefore not require a disc lookup) - although this will not be appropriate for all cases so you will have to have a play.
Paul Dixon's answer is indeed a solution to the problem, but you'll have to maintain the sequence table and ensure that there is no row gaps.
If that's feasible, a better solution would be to simply ensure that the original table has no row gaps, and starts from id 1. Then grab the rows using the id for pagination.
SELECT * FROM table A WHERE id >= 1 AND id <= 1000;
SELECT * FROM table A WHERE id >= 1001 AND id <= 2000;
and so on...
I have run into this problem recently. The problem was two parts to fix. First I had to use an inner select in my FROM clause that did my limiting and offsetting for me on the primary key only:
$subQuery = DB::raw("( SELECT id FROM titles WHERE id BETWEEN {$startId} AND {$endId} ORDER BY title ) as t");
Then I could use that as the from part of my query:
'titles.id',
'title_eisbns_concat.eisbns_concat',
'titles.pub_symbol',
'titles.title',
'titles.subtitle',
'titles.contributor1',
'titles.publisher',
'titles.epub_date',
'titles.ebook_price',
'publisher_licenses.id as pub_license_id',
'license_types.shortname',
$coversQuery
)
->from($subQuery)
->leftJoin('titles', 't.id', '=', 'titles.id')
->leftJoin('organizations', 'organizations.symbol', '=', 'titles.pub_symbol')
->leftJoin('title_eisbns_concat', 'titles.id', '=', 'title_eisbns_concat.title_id')
->leftJoin('publisher_licenses', 'publisher_licenses.org_id', '=', 'organizations.id')
->leftJoin('license_types', 'license_types.id', '=', 'publisher_licenses.license_type_id')
The first time I created this query I had used the OFFSET and LIMIT in MySql. This worked fine until I got past page 100 then the offset started getting unbearably slow. Changing that to BETWEEN in my inner query sped it up for any page. I'm not sure why MySql hasn't sped up OFFSET but between seems to reel it back in.