I'm having trouble understanding my options for how to optimize this specific query. Looking online, I find various resources, but all for queries that don't match my particular one. From what I could gather, it's very hard to optimize a query when you have an order by combined with a limit.
My usecase is that i would like to have a paginated datatable that displayed the latest records first.
The query in question is the following (to fetch 10 latest records):
select
`xyz`.*
from
xyz
where
`xyz`.`fk_campaign_id` = 95870
and `xyz`.`voided` = 0
order by
`registration_id` desc
limit 10 offset 0
& table DDL:
CREATE TABLE `xyz` (
`registration_id` int NOT NULL AUTO_INCREMENT,
`fk_campaign_id` int DEFAULT NULL,
`fk_customer_id` int DEFAULT NULL,
... other fields ...
`voided` tinyint unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`registration_id`),
.... ~12 other indexes ...
KEY `activityOverview` (`fk_campaign_id`,`voided`,`registration_id` DESC)
) ENGINE=InnoDB AUTO_INCREMENT=280614594 DEFAULT CHARSET=utf8 COLLATE=utf8_danish_ci;
The explain on the query mentioned gives me the following:
"id","select_type","table","partitions","type","possible_keys","key","key_len","ref","rows","filtered","Extra"
1,SIMPLE,db_campaign_registration,,index,"getTop5,winners,findByPage,foreignKeyExistingCheck,limitReachedIp,byCampaign,emailExistingCheck,getAll,getAllDated,activityOverview",PRIMARY,"4",,1626,0.65,Using where; Backward index scan
As you can see it says it only hits 1626 rows. But, when i execute it - then it takes 200+ seconds to run.
I'm doing this to fetch data for a datatable that is to display the latest 10 records. I also have pagination that allows one to navigate pages (only able to go to next page, not last or make any big jumps).
To further help with getting the full picture I've put together a dbfiddle. https://dbfiddle.uk/Jc_K68rj - this fiddle does not have the same results as my table. But i suspect this is because of the data size that I'm having with my table.
The table in question has 120GB data and 39.000.000 active records. I already have an index put in that should cover the query and allow it to fetch the data fast. Am i completely missing something here?
Another solution goes something like this:
SELECT b.*
FROM ( SELECT registration_id
FROM xyz
where `xyz`.`fk_campaign_id` = 95870
and `xyz`.`voided` = 0
order by `registration_id` desc
limit 10 offset 0 ) AS a
JOIN xyz AS b USING (registration_id)
order by `registration_id` desc;
Explanation:
The derived table (subquery) will use the 'best' query without any extra prompting -- since it is "covering".
That will deliver 10 ids
Then 10 JOINs to the table to get xyz.*
A derived table is unordered, so the ORDER BY does need repeating.
That's tricking the Optimizer into doing what it should have done anyway.
(Again, I encourage getting rid of any indexes that are prefixes of the the 3-column, optimal, index discussed.)
KEY `activityOverview` (`fk_campaign_id`,`voided`,`registration_id` DESC)
is optimal. (Nearly as good is the same index, but without the DESC).
Let's see the other indexes. I strongly suspect that there is at least one index that is a prefix of that index. Remove it/them. The Optimizer sometimes gets confused and picks the "smaller" index instead of the "better index.
Here's a technique for seeing whether it manages to read only 10 rows instead of most of the table: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#handler_counts
Related
I have a pagination query which does range index scan on a large table:
create table t_dummy (
id int not null auto_increment,
field1 varchar(255) not null,
updated_ts timestamp null default null,
primary key (id),
key idx_name (updated_ts)
The query looks like this:
select * from t_dummy a
where a.field1 = 'VALUE'
and (a.updated_ts > 'some time' or (a.updated_ts = 'some time' and a.id > x)
order by a.updated_ts, a.id
limit 100
The explain plan show large cost with rows value being very high, however, it is using all the right indexes and the execution seems fast. Can someone please tell whether this means the query is inefficient?
EXPLAIN can be misleading. It can report a high value for rows, despite the fact that MySQL optimizes LIMIT queries to stop once enough rows have been found that satisfy your requested LIMIT (100 in your case).
The problem is, at the time the query does the EXPLAIN, it doesn't necessarily know how many rows it will have to examine to find at least 100 rows that satisfy the conditions in your WHERE clause.
So you can usually ignore the rows field of the EXPLAIN when you have a LIMIT query. It probably won't really have to examine so many rows.
If the execution is fast enough, don't worry about it. If it is not, consider a (field1,updated_ts) index and/or changing your query to
and a.updated_ts >= 'some time' and (a.updated_ts > 'some time' or a.id > x)
As Bill says, Explain cannot be trusted to take LIMIT into account.
The following will confirm that the query is touching only 100 rows:
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';
The Handler_read% values will probably add up to about 100. There will probably be no Handler_write% values -- they would indicate the creation of a temp table.
A tip: If you use LIMIT 101, you get the 100 rows to show, plus an indication of whether there are more rows. This, with very low cost, avoids having a [Next] button that sometimes brings up a blank page.
My tips on the topic: http://mysql.rjweb.org/doc.php/pagination
I have problem with MySQL ORDER BY, it slows down query and I really don't know why, my query was a little more complex so I simplified it to a light query with no joins, but it stills works really slow.
Query:
SELECT
W.`oid`
FROM
`z_web_dok` AS W
WHERE
W.`sent_eRacun` = 1 AND W.`status` IN(8, 9) AND W.`Drzava` = 'BiH'
ORDER BY W.`oid` ASC
LIMIT 0, 10
The table has 946,566 rows, with memory taking 500 MB, those fields I selecting are all indexed as follow:
oid - INT PRIMARY KEY AUTOINCREMENT
status - INT INDEXED
sent_eRacun - TINYINT INDEXED
Drzava - VARCHAR(3) INDEXED
I am posting screenshoots of explain query first:
The next is the query executed to database:
And this is speed after I remove ORDER BY.
I have also tried sorting with DATETIME field which is also indexed, but I get same slow query as with ordering with primary key, this started from today, usually it was fast and light always.
What can cause something like this?
The kind of query you use here calls for a composite covering index. This one should handle your query very well.
CREATE INDEX someName ON z_web_dok (Drzava, sent_eRacun, status, oid);
Why does this work? You're looking for equality matches on the first three columns, and sorting on the fourth column. The query planner will use this index to satisfy the entire query. It can random-access the index to find the first row matching your query, then scan through the index in order to get the rows it needs.
Pro tip: Indexes on single columns are generally harmful to performance unless they happen to match the requirements of particular queries in your application, or are used for primary or foreign keys. You generally choose your indexes to match your most active, or your slowest, queries. Edit You asked whether it's better to create specific indexes for each query in your application. The answer is yes.
There may be an even faster way. (Or it may not be any faster.)
The IN(8, 9) gets in the way of easily handling the WHERE..ORDER BY..LIMIT completely efficiently. The possible solution is to treat that as OR, then convert to UNION and do some tricks with the LIMIT, especially if you might also be using OFFSET.
( SELECT ... WHERE .. = 8 AND ... ORDER BY oid LIMIT 10 )
UNION ALL
( SELECT ... WHERE .. = 9 AND ... ORDER BY oid LIMIT 10 )
ORDER BY oid LIMIT 10
This will allow the covering index described by OJones to be fully used in each of the subqueries. Furthermore, each will provide up to 10 rows without any temp table or filesort. Then the outer part will sort up to 20 rows and deliver the 'correct' 10.
For OFFSET, see http://mysql.rjweb.org/doc.php/index_cookbook_mysql#or
I'm looking for a reason and suggestions.
My table have about 1.4Million rows and when I run following query it took over 3 minutes. I added count just for showing result. My real query is without count.
MariaDB [ams]> SELECT count(asin) FROM asins where asins.is_active = 1
and asins.title is null and asins.updated < '2018-10-28' order by sortorder,id;
+-------------+
| count(asin) |
+-------------+
| 187930 |
+-------------+
1 row in set (3 min 34.34 sec)
Structure
id int(9) Primary
asin varchar(25) UNIQUE
is_active int(1) Index
sortorder int(9) Index
Please let me know if you need more information.
Thanks in advance.
EDIT
Query with EXPLAIN
MariaDB [ams]> EXPLAIN SELECT asin FROM asins where asins.is_active = 1 and asins.title is null and asins.updated < '2018-10-28' order by sortorder,id;
The database is scanning all the rows to answer the query. I imagine you have a really big table.
For this query, the ORDER BY is unnecessary (but it should have no impact on performance:
SELECT count(asin)
FROM asins
WHERE asins.is_active = 1 AND
asins.title is null AND
asins.updated < '2018-10-28' ;
Then you want an index on (is_active, title, updated).
Looks like you have an index on is_active and updated. So that index is going to be scanned (like a table scan, every record in the index read), but since title is not in the index, there is going to be a second operation which looks up title in the table. You can think of this as a join between the index and the table. If most of the records in the index match your conditions, then the join is going to involve most of the data in the table. Large joins are slow.
You might be better off with a full table scan if the conditions against the index are going to result in a large number of records returned.
See https://dba.stackexchange.com/questions/110707/how-can-i-force-mysql-to-ignore-all-indexes for a way to force the full table scan. Give it a try and see if your query is faster.
Try these:
INDEX(is_active, updated),
INDEX(is_active, sortorder, id)
And please provide SHOW CREATE TABLE.
With the first of these indexes, some of the filtering will be done, but then it will still have to sort the results.
With the second index, the Optimizer may chose to filter on the only = column, then avoid the sort by launching into the ORDER BY. The risk is that it will still have to hit so many rows that avoiding the sort is not worth it.
What percentage of the table has is_active = 1? What percentage has a null title? What percentage is in that date range?
When you create a compound index, and part of it is range based, you want the range based part first.
So try the index (updated, is_active, title)
This way updated becomes a prefix and can be used in range queries.
I have to sort the query result on an alias column (total_reports) which is in group by condition with limit of having 50 number of records.
Please let me know where I am missing,
SELECT Count(world_name) AS total_reports,
name,
Max(last_update) AS report
FROM `world`
WHERE ( `id` = ''
AND `status` = 1 )
AND `time` >= '2017-07-16'
AND `name` LIKE '%%'
GROUP BY `name`
HAVING `total_reports` >= 2
ORDER BY `total_reports` DESC
LIMIT 50 offset 0
Query return what I need. However it runs on all records of table then return result and takes too many time which is not right way. I have thousands of records so its take time. I want to apply key index on alias which is total_reports in my situation.
Create an index on an column from an aggregated result? No, I'm sorry, but MySQL cannot do that natively.
What you need is probably a Materialized View that you could index. Not supported in MySQL (yet), unless you install extra plugins. See How to Create a Materialized View in MySQL.
The Long Answer
You cannot create an index on a column resulting from a GROUP BY statement. That column does not exist on the table, and cannot be derived at the row level (not a virtual column).
You query may be slow since it's probably reading the whole table. To only read the specific range of rows, add the index:
create index ix1 on `world` (`status`, `id`, `time`);
That should make the query use the filtering condition in a much better way and hopefully will speed up your query, by using and Index Range Scan.
Also, please change '%%' for '%'. Double % doesn't make too much sense. Actually, you should remove this condition altogether -- it's not filtering anything.
Finally, if the query is still slow, please post the execution plan, using:
explain <my_query_here>
I wish to fetch the last 10 rows from the table of 1 M rows.
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`updated_date` datetime NOT NULL,
PRIMARY KEY (`id`)
)
One way of doing this is -
select * from test order by -id limit 10;
**10 rows in set (0.14 sec)**
Another way of doing this is -
select * from test order by id desc limit 10;
**10 rows in set (0.00 sec)**
So I did an 'EXPLAIN' on these queries -
Here is the result for the query where I use 'order by desc'
EXPLAIN select * from test order by id desc limit 10;
And here is the result for the query where I use 'order by -id'
EXPLAIN select * from test order by -id limit 10;
I thought this would be same but is seems there are differences in the execution plan.
RDBMS use heuristics to calculate the execution plan, they cannot always determine the semantic equivalence of two statements as it is a too difficult problem (in terms of theoretical and practical complexity).
So MySQL is not able to use the index, as you do not have an index on "-id", that is a custom function applied to the field "id". Seems trivial, but the RDBMSs must minimize the amount of time needed to compute the plans, so they get stuck with simple problems.
When an optimization cannot be found for a query (i.e. using an index) the system fall back to the implementation that works in any case: a scan of the full table.
As you can see in Explain results,
1 : order by id
MySQL is using indexing on id. So it need to iterate only 10 rows as it is already indexed. And also in this case MySQL don't need to use filesort algorithm as it is already indexed.
2 : order by -id
MySQL is not using indexing on id. So it needs to iterate all the rows.( e.g. 455952) to get your expected results. In this case MySQL needs to use filesort algorithm as id is not indexed. So it will obviously take more time :)
You use ORDER BY with an expression that includes terms other than the key column name:
SELECT * FROM t1 ORDER BY ABS(key);
SELECT * FROM t1 ORDER BY -key;
You index only a prefix of a column named in the ORDER BY clause. In this case, the index cannot be used to fully resolve the sort order. For example, if you have a CHAR(20) column, but index only the first 10 bytes, the index cannot distinguish values past the 10th byte and a filesort will be needed.
The type of table index used does not store rows in order. For example, this is true for a HASH index in a MEMORY table.
Please follow this link: http://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html