I have a pagination query which does range index scan on a large table:
create table t_dummy (
id int not null auto_increment,
field1 varchar(255) not null,
updated_ts timestamp null default null,
primary key (id),
key idx_name (updated_ts)
The query looks like this:
select * from t_dummy a
where a.field1 = 'VALUE'
and (a.updated_ts > 'some time' or (a.updated_ts = 'some time' and a.id > x)
order by a.updated_ts, a.id
limit 100
The explain plan show large cost with rows value being very high, however, it is using all the right indexes and the execution seems fast. Can someone please tell whether this means the query is inefficient?
EXPLAIN can be misleading. It can report a high value for rows, despite the fact that MySQL optimizes LIMIT queries to stop once enough rows have been found that satisfy your requested LIMIT (100 in your case).
The problem is, at the time the query does the EXPLAIN, it doesn't necessarily know how many rows it will have to examine to find at least 100 rows that satisfy the conditions in your WHERE clause.
So you can usually ignore the rows field of the EXPLAIN when you have a LIMIT query. It probably won't really have to examine so many rows.
If the execution is fast enough, don't worry about it. If it is not, consider a (field1,updated_ts) index and/or changing your query to
and a.updated_ts >= 'some time' and (a.updated_ts > 'some time' or a.id > x)
As Bill says, Explain cannot be trusted to take LIMIT into account.
The following will confirm that the query is touching only 100 rows:
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';
The Handler_read% values will probably add up to about 100. There will probably be no Handler_write% values -- they would indicate the creation of a temp table.
A tip: If you use LIMIT 101, you get the 100 rows to show, plus an indication of whether there are more rows. This, with very low cost, avoids having a [Next] button that sometimes brings up a blank page.
My tips on the topic: http://mysql.rjweb.org/doc.php/pagination
Related
I'm having trouble understanding my options for how to optimize this specific query. Looking online, I find various resources, but all for queries that don't match my particular one. From what I could gather, it's very hard to optimize a query when you have an order by combined with a limit.
My usecase is that i would like to have a paginated datatable that displayed the latest records first.
The query in question is the following (to fetch 10 latest records):
select
`xyz`.*
from
xyz
where
`xyz`.`fk_campaign_id` = 95870
and `xyz`.`voided` = 0
order by
`registration_id` desc
limit 10 offset 0
& table DDL:
CREATE TABLE `xyz` (
`registration_id` int NOT NULL AUTO_INCREMENT,
`fk_campaign_id` int DEFAULT NULL,
`fk_customer_id` int DEFAULT NULL,
... other fields ...
`voided` tinyint unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`registration_id`),
.... ~12 other indexes ...
KEY `activityOverview` (`fk_campaign_id`,`voided`,`registration_id` DESC)
) ENGINE=InnoDB AUTO_INCREMENT=280614594 DEFAULT CHARSET=utf8 COLLATE=utf8_danish_ci;
The explain on the query mentioned gives me the following:
"id","select_type","table","partitions","type","possible_keys","key","key_len","ref","rows","filtered","Extra"
1,SIMPLE,db_campaign_registration,,index,"getTop5,winners,findByPage,foreignKeyExistingCheck,limitReachedIp,byCampaign,emailExistingCheck,getAll,getAllDated,activityOverview",PRIMARY,"4",,1626,0.65,Using where; Backward index scan
As you can see it says it only hits 1626 rows. But, when i execute it - then it takes 200+ seconds to run.
I'm doing this to fetch data for a datatable that is to display the latest 10 records. I also have pagination that allows one to navigate pages (only able to go to next page, not last or make any big jumps).
To further help with getting the full picture I've put together a dbfiddle. https://dbfiddle.uk/Jc_K68rj - this fiddle does not have the same results as my table. But i suspect this is because of the data size that I'm having with my table.
The table in question has 120GB data and 39.000.000 active records. I already have an index put in that should cover the query and allow it to fetch the data fast. Am i completely missing something here?
Another solution goes something like this:
SELECT b.*
FROM ( SELECT registration_id
FROM xyz
where `xyz`.`fk_campaign_id` = 95870
and `xyz`.`voided` = 0
order by `registration_id` desc
limit 10 offset 0 ) AS a
JOIN xyz AS b USING (registration_id)
order by `registration_id` desc;
Explanation:
The derived table (subquery) will use the 'best' query without any extra prompting -- since it is "covering".
That will deliver 10 ids
Then 10 JOINs to the table to get xyz.*
A derived table is unordered, so the ORDER BY does need repeating.
That's tricking the Optimizer into doing what it should have done anyway.
(Again, I encourage getting rid of any indexes that are prefixes of the the 3-column, optimal, index discussed.)
KEY `activityOverview` (`fk_campaign_id`,`voided`,`registration_id` DESC)
is optimal. (Nearly as good is the same index, but without the DESC).
Let's see the other indexes. I strongly suspect that there is at least one index that is a prefix of that index. Remove it/them. The Optimizer sometimes gets confused and picks the "smaller" index instead of the "better index.
Here's a technique for seeing whether it manages to read only 10 rows instead of most of the table: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#handler_counts
I'm looking for a reason and suggestions.
My table have about 1.4Million rows and when I run following query it took over 3 minutes. I added count just for showing result. My real query is without count.
MariaDB [ams]> SELECT count(asin) FROM asins where asins.is_active = 1
and asins.title is null and asins.updated < '2018-10-28' order by sortorder,id;
+-------------+
| count(asin) |
+-------------+
| 187930 |
+-------------+
1 row in set (3 min 34.34 sec)
Structure
id int(9) Primary
asin varchar(25) UNIQUE
is_active int(1) Index
sortorder int(9) Index
Please let me know if you need more information.
Thanks in advance.
EDIT
Query with EXPLAIN
MariaDB [ams]> EXPLAIN SELECT asin FROM asins where asins.is_active = 1 and asins.title is null and asins.updated < '2018-10-28' order by sortorder,id;
The database is scanning all the rows to answer the query. I imagine you have a really big table.
For this query, the ORDER BY is unnecessary (but it should have no impact on performance:
SELECT count(asin)
FROM asins
WHERE asins.is_active = 1 AND
asins.title is null AND
asins.updated < '2018-10-28' ;
Then you want an index on (is_active, title, updated).
Looks like you have an index on is_active and updated. So that index is going to be scanned (like a table scan, every record in the index read), but since title is not in the index, there is going to be a second operation which looks up title in the table. You can think of this as a join between the index and the table. If most of the records in the index match your conditions, then the join is going to involve most of the data in the table. Large joins are slow.
You might be better off with a full table scan if the conditions against the index are going to result in a large number of records returned.
See https://dba.stackexchange.com/questions/110707/how-can-i-force-mysql-to-ignore-all-indexes for a way to force the full table scan. Give it a try and see if your query is faster.
Try these:
INDEX(is_active, updated),
INDEX(is_active, sortorder, id)
And please provide SHOW CREATE TABLE.
With the first of these indexes, some of the filtering will be done, but then it will still have to sort the results.
With the second index, the Optimizer may chose to filter on the only = column, then avoid the sort by launching into the ORDER BY. The risk is that it will still have to hit so many rows that avoiding the sort is not worth it.
What percentage of the table has is_active = 1? What percentage has a null title? What percentage is in that date range?
When you create a compound index, and part of it is range based, you want the range based part first.
So try the index (updated, is_active, title)
This way updated becomes a prefix and can be used in range queries.
I have to sort the query result on an alias column (total_reports) which is in group by condition with limit of having 50 number of records.
Please let me know where I am missing,
SELECT Count(world_name) AS total_reports,
name,
Max(last_update) AS report
FROM `world`
WHERE ( `id` = ''
AND `status` = 1 )
AND `time` >= '2017-07-16'
AND `name` LIKE '%%'
GROUP BY `name`
HAVING `total_reports` >= 2
ORDER BY `total_reports` DESC
LIMIT 50 offset 0
Query return what I need. However it runs on all records of table then return result and takes too many time which is not right way. I have thousands of records so its take time. I want to apply key index on alias which is total_reports in my situation.
Create an index on an column from an aggregated result? No, I'm sorry, but MySQL cannot do that natively.
What you need is probably a Materialized View that you could index. Not supported in MySQL (yet), unless you install extra plugins. See How to Create a Materialized View in MySQL.
The Long Answer
You cannot create an index on a column resulting from a GROUP BY statement. That column does not exist on the table, and cannot be derived at the row level (not a virtual column).
You query may be slow since it's probably reading the whole table. To only read the specific range of rows, add the index:
create index ix1 on `world` (`status`, `id`, `time`);
That should make the query use the filtering condition in a much better way and hopefully will speed up your query, by using and Index Range Scan.
Also, please change '%%' for '%'. Double % doesn't make too much sense. Actually, you should remove this condition altogether -- it's not filtering anything.
Finally, if the query is still slow, please post the execution plan, using:
explain <my_query_here>
I am wondering why the first query is so much faster than the second.
It is run on a table of around 500k records.
SELECT date FROM `log` WHERE `action` = 'SOMETHING' and token = 167 ORDER BY id DESC LIMIT 1;
-- 0.0003 sec
SELECT max(date) FROM `log` WHERE `action` = 'SOMETHING' and token = 167;
-- 0.0023 sec
My guess is that you don't have an index on date, but DO have an index on id (Primary key)?
If id is a primary key, it can sort the data to return results to you in much quicker. This is because even though your query only returns a date column, the rows are being ORDERED by id, which is indexed, allowing 500k rows to be ordered VERY quickly.
To return the maximum date, even if its one row of a single column ... if date isn't indexed, the database needs to examine EVERY row record to determine which is the largest.
Both queries will run very quickly with an index on log(action, token, id).
To understand the performance differences between those two queries, please give the explain plan for them. You can get this by preceding the queries with explain:
explain SELECT date FROM `log` WHERE `action` = 'SOMETHING' and token = 167 ORDER BY id DESC LIMIT 1;
explain SELECT max(date) FROM `log` WHERE `action` = 'SOMETHING' and token = 167;
The first command sorts the table and brings the first record. The second command although not ordering, the db need to read the entire table to find out which is the highest record. Also your table probably does not have index for the field date
These two queries are very different logic, although possibly they return the same value because of a correlation between id's and dates that makes sense to you, but which the query optimiser clearly cannot know.
With an index on (action, token, date) the optimiser may be able to perform the second query faster, but if you're absolutely sure of that correlation then there's nothing wrong with using the first query.
I have made mysql explain the following query:
SELECT carid,pic0,bio,url,site,applet
FROM cronjob_reloaded
WHERE
carid LIKE '%bmw%'
OR
carid LIKE '%mer%'
OR
age BETWEEN '5' AND '10'
OR
category IN ('used')
ORDER BY CASE
WHEN carid LIKE '%bmw%' OR carid LIKE '%mer%' THEN 1
WHEN age BETWEEN '5' AND '10' THEN 2
ELSE 3
END
And here is the explain result:
EXPLAIN SELECT carid, pic0, bio, url, site, applet
FROM cronjob_reloaded
WHERE carid LIKE '%bmw%'
OR carid LIKE '%mer%'
OR carid IS NOT NULL
AND age
BETWEEN '5'
AND '10'
What I do not understand it this:
Why is the key NULL?
Can I make this query faster? It takes 0.0035 sec - is this slow or fast for a 1000 rows table?
In my table carid is the primary key of the table.
MySQL did not find any indexes to use for the query.
The speed of the query depends on your CPU, and for so few rows, also on available RAM, system load, and disk speed. You can use BENCHMARK to run the query several times and time it with higher precision (e.g. you execute it 100,000 times and divide the total time by 100,000).
As for the indexing issue: your WHERE clause involves carid, age, category (and indirectly performerid). You ought to index on category first (since you ask a direct match on it), age, and finally carid.
CREATE INDEX test_index ON cronjob_reloaded ( category, age, carid );
This brings together most of the information that MySQL needs for the WHERE phase of the query in a single index operation.
Adding performerid may speed this up, or not, depending on several factors. I'd start without and maybe test it later on.
Update: the original query seems to have changed, and no performerid appears anymore.
Finally, 1000 rows usually requires so little time that MySQL might even decide not to use the index at all since it's faster to load everything and let the WHERE sort out its own.
As per the docs:
"If key is NULL, MySQL found no index to use for executing the query more efficiently."
Please refer below link for Official document on it.
Mysql Doc
Edit :
Here are the links for Index
How mysql Index work's - SO
How to create index
Hope this help !