When I execute the query below in MariaDB 10.1/MySQL 5.7 the results have 849 row and execute in 0,016 seconds.
SELECT a.*
FROM bco_dav_dm.d_ttlz_cli a
WHERE FL_ATND_FNAL = 0
AND a.ID_ATND = 218
ORDER BY A.FL_CRTR, A.DT_PRMR_ATVC_LNHA;
But when I add LIMIT clause to return only 1 row the query execute in 9 seconds!!! Why ?
I tested:
Using LIMIT 1 the query execute in 9 seconds.
Using LIMIT 2 the query execute in 9 seconds.
Using LIMIT 3 the query execute in 9 seconds.
Using LIMIT 4 and above (5,6,7,8...100, 200, 300, 400) the query execute in 0,016 seconds!!!
I've tested several times and I always have the same result.
How I will use this query in Web App, and I need only 1 record I don't know why LIMIT <=3 slow down the query!
In other posts, talks that using higher OFFSET naturally slow down the query, but I don't use OFFSET.
My Explain:
select_type: SIMPLE
table:a
type:ref
possible_keys:ID_ATND,FL_ATND_FNAL
key:ID_ATND
key_len:5
ref:const
rows:1846
Extra: Using where; Using filesort
EDITED:
I noticed when I use LIMIT 3 or below my explain change
select_type: SIMPLE
table:a
type:ref
possible_keys:ID_ATND,FL_ATND_FNAL
key:ORDER_BY_CRTR_DT
key_len:6
ref:const
rows:1764
Extra: Using where
The Index ORDER_BY_CRTR_DT is a composed Index that I use in my ORDER BY
INDEX ORDER_BY_CRTR_DT(FL_CRTR, DT_PRMR_ATVC_LNHA);
The cost based optimizer views the situation a bit different with different limits and just gets is plain wrong in this case. You will see this kind of weird behavior more often and in pretty much all cost based databases.
The place in this case where you see the difference is in the chosen index ORDER_BY_CRTR_DT and ID_ATND in the other plan, which the database then uses to estimate the number of rows. Seeing a lower number of rows, the cost based optimizer assumes the query is faster (simplified view point).
What sometimes can help is rebuilding the table and the indexes so that they all have the most recent information about the data in the histograms describing the data. Over time the result might change again due to inserts, updates and deletes, with as result that the plan can degrade again. Stabilization of the plan is however often reached by having a regular rebuild.
Alternative you can force the index to be used, with as result that the cost based optimizer is disabled for this plan. This can however backfire in the same way the cost based optimizer fails you right now.
Second alternative is to drop the index which gives you the 9s result, which might be an option if it is not used or has little impact on other queries.
Related
Consider a table Test having 1000 rows
Test Table
id name desc
1 Adi test1
2 Sam test2
3 Kal test3
.
.
1000 Jil test1000
If i need to fetch, say suppose 100 rows(i.e. a small subset) only, then I am using LIMIT clause in my query
SELECT * FROM test LIMIT 100;
This query first fetches 1000 rows and then returns 100 out of it.
Can this be optimised, such that the DB engine queries only 100 rows and returns them
(instead of fetching all 1000 rows first and then returning 100)
Reason for above supposition is that the order of processing will be
FROM
WHERE
SELECT
ORDER BY
LIMIT
You can combine LIMIT ROW COUNT with an ORDER BY, This causes MySQL to stop sorting as soon as it has found the first ROW COUNT rows of the sorted result.
Hope this helps, If you need any clarification just drop a comment.
The query you wrote will fetch only 100 rows, not 1000. But, if you change that query in any way, my statement may be wrong.
GROUP BY and ORDER BY are likely to incur a sort, which is arguably even slower than a full table scan. And that sort must be done before seeing the LIMIT.
Well, not always...
SELECT ... FROM t ORDER BY x LIMIT 100;
together with INDEX(x) -- This may use the index and fetch only 100 rows from the index. BUT... then it has to reach into the data 100 times to find the other columns that you ask for. UNLESS you only ask for x.
Etc, etc.
And here's another wrinkle. A lot of questions on this forum are "Why isn't MySQL using my index?" Back to your query. If there are "only" 1000 rows in your table, my example with the ORDER BY x won't use the index because it is faster to simply read through the table, tossing 90% of the rows. On the other hand, if there were 9999 rows, then it would use the index. (The transition is somewhere around 20%, but it that is imprecise.)
Confused? Fine. Let's discuss one query at a time. I can [probably] discuss the what and why of each one you throw at me. Be sure to include SHOW CREATE TABLE, the full query, and EXPLAIN SELECT... That way, I can explain what EXPLAIN tells you (or does not).
Did you know that having both a GROUP BY and ORDER BY may cause the use of two sorts? EXPLAIN won't point that out. And sometimes there is a simple trick to get rid of one of the sorts.
There are a lot of tricks up MySQL's sleeve.
Given the following two queries:
Query #1
SELECT log.id
FROM log
WHERE user_id IN
(188858, 188886, 189854, 203623, 204072)
and type in (14, 15, 17)
ORDER BY log.id DESC
LIMIT 25 OFFSET 0;
Query #2 - 4 IDs instead 5
SELECT log.id
FROM log
WHERE user_id IN
(188858, 188886, 189854, 203623)
and type in (14, 15, 17)
ORDER BY log.id DESC
LIMIT 25 OFFSET 0;
Explain Plan
-- Query #1
1 SIMPLE log range idx_user_id_and_log_id idx_user_id_and_log_id 4 41280 Using index condition; Using where; Using filesort
-- Query #2
1 SIMPLE log index idx_user_id_and_log_id PRIMARY 4 53534 Using where
Why the addition of a single ID makes the execution plan so different? I'm talking about a difference in time of milliseconds to ~1 minute. I thought that it could be related to the eq_range_index_dive_limit parameters, but it's bellow 10 anyway (the default). I know that I can force the usage of the index instead of the clustered index, but I wanted to know why MySQL decided that.
Should I try to understand that? Or sometimes it's not possible to understand query planner decisions?
Extra Details
Table Size: 11GB
Rows: 108 Million
MySQL: 5.6.7
Doesn't matter which ID is removed from the IN clause.
The index: idx_user_id_and_log_id(user_id, id)
As you have shown, MySQL has two alternative query plans for queries with ORDER BY ... LIMIT n:
Read all qualifying rows, sort them, and pick the n top rows.
Read the rows in sorted order and stop when n qualifying rows have been found.
In order to decide which is the better option, the optimizer needs to estimate the filtering effect of your WHERE condition. This is not straight-forward, especially for columns that are not indexed, or for columns where values are correlated. In your case, one probably has to read a lot more of the table in sorted order in order to find the first 25 qualifying rows than what the optimizer expected.
There have been several improvements in how LIMIT queries are handled, both in later releases of 5.6 (you are running on a pre-GA release!), and in newer releases (5.7, 8.0). I suggest you try to upgrade to a later release, and see if this still is an issue.
In general, if you want to understand query planner decisions, you should look at the optimizer trace for the query.
JOIN is much more efficient.
Create a temporary table with the values of the IN operator.
Then make a JOIN between table 'log' to the temporary table of values.
Refer to this answer
for more info.
Add
INDEX(user_id, type, id),
INDEX(type, user_id, id)
Each of these is a "covering" index. As such, the entire query can be performed by looking only in one index, without touching the 'data'.
I have two choices for the Optimizer -- hopefully it will be able to pick whether user_id IN (...) is more selective or type IN (...) in order to pick the better index.
If, after adding those, you don't have any use for idx_user_id_and_log_id(user_id, id), DROP it.
(No, I can't explain why query 2 chose to do a table scan.)
Is there a way to optimize the following query?
SELECT count(*)>1000 FROM table_with_lot_of_rows WHERE condition_on_index;
Using this query, MySQL first performs the count(*) and then the comparison. This is is fast when only few rows satisfy the condition, but can take forever if a lot of rows satisfy it. Is there a way to stop counting as soon as 1000 items are found, instead of going through all the results?
In particular, I'm interested in MyISAM table with full-text condition, but any answer for InnoDB and/or basic WHERE clause will help.
SELECT 1
FROM table_with_lot_of_rows
WHERE condition_on_index
LIMIT 1000, 1;
Works this way:
Using the index (which is presumably faster than using the data)
Skip over 1000 rows, collecting nothing. (This is better than other answers.)
If you make it this far, fetch 1 row, containing only the literal 1 (in the SELECT).
Now you either have an empty result set (<= 1000 rows) or a row of 1 (at least 1001 rows).
Then, depending on your application language, it is easy to distinguish between the two cases.
Another note: If this is to be a subquery in a bigger query, then do
EXISTS ( SELECT 1
FROM table_with_lot_of_rows
WHERE condition_on_index
LIMIT 1000, 1 )
Which returns TRUE/FALSE (which are synonymous with 1 or 0).
Face it, scanning 1001 rows, even of the index, will take some time. I think my formulation is the fastest possible.
Other things to check: Is this InnoDB? Does EXPLAIN say "Using index"? How much RAM? What is the setting of innodb_buffer_pool_size?
Note that InnoDB now has FULLTEXT, so there is no reason to stick with MyISAM.
If you are using MyISAM and the WHERE is MATCH..., then most of what I said is likely not to be applicable. FULLTEXT probably fetches all results before giving the rest of the engine to chance to do these games with ORDER BY and LIMIT.
Please show us the actual query, its EXPLAIN, and SHOW CREATE TABLE. And what is the real goal? To see if a query will deliver "too many" results?
Possible improvement (depending on context)
Since my initial SELECT returns scalar 1 or NULL, it can be used in any boolean context such as WHERE. 1 is TRUE, NULL will be treated as FALSE. Hence EXISTS is probably redundant.
Also, 1/NULL can be turned into 1/0 thus. Note: the extra parens are required.
IFNULL( ( SELECT ... LIMIT 1000,1 ), 0)
You can optimize the query using a sub-query with a LIMIT:
SELECT count(*)>1000 FROM (
SELECT 0 table_with_lot_of_rows
WHERE condition_on_index
LIMIT 1001
) as truncated_count;
In that case, MySQL stops as soon as enough rows satisfy the condition.
I have a query which has 7-8 table join, and major 4 tables has approximately 200000 records. this query needs conditional order by clause as per result. this query returns results in 1.6 seconds. I think this is very slow, I expect result should be come within 200ms. So I have checked execution plan using EXPLAIN command, and it's shows me in first line that 49150 rows scanned Using where; Using temporary and Using filesort.
Now I have put this query as subquery (like "SELECT * FROM ( My Actual Query ) a ORDER BY a.field ) and then check with EXPLAIN command. so this time it shows me 5454396 rows scanned using filesort. and my result come within 1 second. I don't understand which is better way to achive performance, because this plan scanned much more rows compare to first query plan.
If first query is right for me, then how do I avoid "Using temporary" and keep order by as I needed?
EDIT :
I have updated my schema and query with execution plan here
I have a hug table with 170,000 records.
What is difference between this query
Showing rows 0 - 299 (1,422 total, Query took 1.9008 sec)
SELECT 1 FROM `p_apartmentbuy` p
where
p.price between 500000000 and 900000000
and p.yard = 1
and p.dateadd between 1290000000 and 1320000000
ORDER BY `p`.`id` desc
limit 1669
Explain
And this one:
Showing rows 0 - 299 (1,422 total, Query took 0.2625 sec)
SELECT 1 FROM `p_apartmentbuy` p
where
p.price between 500000000 and 900000000
and p.yard = 1
and p.dateadd between 1290000000 and 1320000000
ORDER BY `p`.`id` desc
limit 1670
Explain:
Both of these queries are using 1 table with same data and have same where clasue, but only limit row count are different
MySQL has a buffer for sorting. When the stuff to be sorted is too big, it sorts chunks, then mergesorts them. This is called "filesort". Your 1670-th row apparently just overflows the sort buffer.
Read more details here.
Now why it picks another key for the in-memory sort... I am not too sure; but apparently its strategy is not quite good since it ends up being slower.
recap: odd that the query returning more rows runs much faster
this is not related to buffer vs file sort, sorting 1400 records takes well under 1 second
the first explain shows the query optimizer doing a linear scan, the second explain shows it using an index. Even a partially helpful index is usually much better than none at all.
Internally, mysql maintains stats about the size of indexes and tries to guess which index, or whether a linear scan would be faster. This estimate is data specific, I've seen mysql use the right index 99 times out of 100, but every now and then pick a different one and run the query 50x slower.
You can override the built-in query optimizer and specify the index to use manually, with SELECT ... FROM ... FORCE INDEX (...)