I have a hug table with 170,000 records.
What is difference between this query
Showing rows 0 - 299 (1,422 total, Query took 1.9008 sec)
SELECT 1 FROM `p_apartmentbuy` p
where
p.price between 500000000 and 900000000
and p.yard = 1
and p.dateadd between 1290000000 and 1320000000
ORDER BY `p`.`id` desc
limit 1669
Explain
And this one:
Showing rows 0 - 299 (1,422 total, Query took 0.2625 sec)
SELECT 1 FROM `p_apartmentbuy` p
where
p.price between 500000000 and 900000000
and p.yard = 1
and p.dateadd between 1290000000 and 1320000000
ORDER BY `p`.`id` desc
limit 1670
Explain:
Both of these queries are using 1 table with same data and have same where clasue, but only limit row count are different
MySQL has a buffer for sorting. When the stuff to be sorted is too big, it sorts chunks, then mergesorts them. This is called "filesort". Your 1670-th row apparently just overflows the sort buffer.
Read more details here.
Now why it picks another key for the in-memory sort... I am not too sure; but apparently its strategy is not quite good since it ends up being slower.
recap: odd that the query returning more rows runs much faster
this is not related to buffer vs file sort, sorting 1400 records takes well under 1 second
the first explain shows the query optimizer doing a linear scan, the second explain shows it using an index. Even a partially helpful index is usually much better than none at all.
Internally, mysql maintains stats about the size of indexes and tries to guess which index, or whether a linear scan would be faster. This estimate is data specific, I've seen mysql use the right index 99 times out of 100, but every now and then pick a different one and run the query 50x slower.
You can override the built-in query optimizer and specify the index to use manually, with SELECT ... FROM ... FORCE INDEX (...)
Related
Consider a table Test having 1000 rows
Test Table
id name desc
1 Adi test1
2 Sam test2
3 Kal test3
.
.
1000 Jil test1000
If i need to fetch, say suppose 100 rows(i.e. a small subset) only, then I am using LIMIT clause in my query
SELECT * FROM test LIMIT 100;
This query first fetches 1000 rows and then returns 100 out of it.
Can this be optimised, such that the DB engine queries only 100 rows and returns them
(instead of fetching all 1000 rows first and then returning 100)
Reason for above supposition is that the order of processing will be
FROM
WHERE
SELECT
ORDER BY
LIMIT
You can combine LIMIT ROW COUNT with an ORDER BY, This causes MySQL to stop sorting as soon as it has found the first ROW COUNT rows of the sorted result.
Hope this helps, If you need any clarification just drop a comment.
The query you wrote will fetch only 100 rows, not 1000. But, if you change that query in any way, my statement may be wrong.
GROUP BY and ORDER BY are likely to incur a sort, which is arguably even slower than a full table scan. And that sort must be done before seeing the LIMIT.
Well, not always...
SELECT ... FROM t ORDER BY x LIMIT 100;
together with INDEX(x) -- This may use the index and fetch only 100 rows from the index. BUT... then it has to reach into the data 100 times to find the other columns that you ask for. UNLESS you only ask for x.
Etc, etc.
And here's another wrinkle. A lot of questions on this forum are "Why isn't MySQL using my index?" Back to your query. If there are "only" 1000 rows in your table, my example with the ORDER BY x won't use the index because it is faster to simply read through the table, tossing 90% of the rows. On the other hand, if there were 9999 rows, then it would use the index. (The transition is somewhere around 20%, but it that is imprecise.)
Confused? Fine. Let's discuss one query at a time. I can [probably] discuss the what and why of each one you throw at me. Be sure to include SHOW CREATE TABLE, the full query, and EXPLAIN SELECT... That way, I can explain what EXPLAIN tells you (or does not).
Did you know that having both a GROUP BY and ORDER BY may cause the use of two sorts? EXPLAIN won't point that out. And sometimes there is a simple trick to get rid of one of the sorts.
There are a lot of tricks up MySQL's sleeve.
Following query is working like expected and uses index
Query takes 0,0481 sec
SELECT
geodb_locations.name,
geodb_locations.name_url,
COUNT(user.uid) AS useranzahl
FROM
user
LEFT JOIN
geodb_locations ON geodb_locations.id=user.plz
WHERE
user.freigeben=1 AND
geodb_locations.adm0='AT'
GROUP BY user.plz
ORDER BY useranzahl DESC
LIMIT 25
Explain
If only country locale is changed within the query from AT to DE
Query takes about 2.5 sec and does not use index
SELECT
geodb_locations.name,
geodb_locations.name_url,
COUNT(user.uid) AS useranzahl
FROM
user
LEFT JOIN
geodb_locations ON geodb_locations.id=user.plz
WHERE
user.freigeben=1 AND
geodb_locations.adm0='DE'
GROUP BY user.plz
ORDER BY useranzahl DESC
LIMIT 25
Explain
Why is index not used by the optimizer of second query and how to improve the query.
2.5 sec are to long ..
If u.uid cannot be NULL, use COUNT(*) instead of COUNT(u.uid).
As already pointed out, remove LEFT.
Add these indexes:
user: (freigeben, plz)
geodb_locations: (adm0, name_url, name)
As for why the EXPLAIN changed, ... It is quite normal (but somewhat rare) for the distribution of the constants to determine what order the tables are touched (Austria is less common than Germany?) or which index to use.
Regardless of optimizations, this query will have to scan a lot more rows for DE than for AT; this has to happen before the sort (ORDER BY) and LIMIT.
Two things prevent much optimization:
The WHERE references both tables.
The ORDER BY depends on a computed value.
When I execute the query below in MariaDB 10.1/MySQL 5.7 the results have 849 row and execute in 0,016 seconds.
SELECT a.*
FROM bco_dav_dm.d_ttlz_cli a
WHERE FL_ATND_FNAL = 0
AND a.ID_ATND = 218
ORDER BY A.FL_CRTR, A.DT_PRMR_ATVC_LNHA;
But when I add LIMIT clause to return only 1 row the query execute in 9 seconds!!! Why ?
I tested:
Using LIMIT 1 the query execute in 9 seconds.
Using LIMIT 2 the query execute in 9 seconds.
Using LIMIT 3 the query execute in 9 seconds.
Using LIMIT 4 and above (5,6,7,8...100, 200, 300, 400) the query execute in 0,016 seconds!!!
I've tested several times and I always have the same result.
How I will use this query in Web App, and I need only 1 record I don't know why LIMIT <=3 slow down the query!
In other posts, talks that using higher OFFSET naturally slow down the query, but I don't use OFFSET.
My Explain:
select_type: SIMPLE
table:a
type:ref
possible_keys:ID_ATND,FL_ATND_FNAL
key:ID_ATND
key_len:5
ref:const
rows:1846
Extra: Using where; Using filesort
EDITED:
I noticed when I use LIMIT 3 or below my explain change
select_type: SIMPLE
table:a
type:ref
possible_keys:ID_ATND,FL_ATND_FNAL
key:ORDER_BY_CRTR_DT
key_len:6
ref:const
rows:1764
Extra: Using where
The Index ORDER_BY_CRTR_DT is a composed Index that I use in my ORDER BY
INDEX ORDER_BY_CRTR_DT(FL_CRTR, DT_PRMR_ATVC_LNHA);
The cost based optimizer views the situation a bit different with different limits and just gets is plain wrong in this case. You will see this kind of weird behavior more often and in pretty much all cost based databases.
The place in this case where you see the difference is in the chosen index ORDER_BY_CRTR_DT and ID_ATND in the other plan, which the database then uses to estimate the number of rows. Seeing a lower number of rows, the cost based optimizer assumes the query is faster (simplified view point).
What sometimes can help is rebuilding the table and the indexes so that they all have the most recent information about the data in the histograms describing the data. Over time the result might change again due to inserts, updates and deletes, with as result that the plan can degrade again. Stabilization of the plan is however often reached by having a regular rebuild.
Alternative you can force the index to be used, with as result that the cost based optimizer is disabled for this plan. This can however backfire in the same way the cost based optimizer fails you right now.
Second alternative is to drop the index which gives you the 9s result, which might be an option if it is not used or has little impact on other queries.
Is there a way to optimize the following query?
SELECT count(*)>1000 FROM table_with_lot_of_rows WHERE condition_on_index;
Using this query, MySQL first performs the count(*) and then the comparison. This is is fast when only few rows satisfy the condition, but can take forever if a lot of rows satisfy it. Is there a way to stop counting as soon as 1000 items are found, instead of going through all the results?
In particular, I'm interested in MyISAM table with full-text condition, but any answer for InnoDB and/or basic WHERE clause will help.
SELECT 1
FROM table_with_lot_of_rows
WHERE condition_on_index
LIMIT 1000, 1;
Works this way:
Using the index (which is presumably faster than using the data)
Skip over 1000 rows, collecting nothing. (This is better than other answers.)
If you make it this far, fetch 1 row, containing only the literal 1 (in the SELECT).
Now you either have an empty result set (<= 1000 rows) or a row of 1 (at least 1001 rows).
Then, depending on your application language, it is easy to distinguish between the two cases.
Another note: If this is to be a subquery in a bigger query, then do
EXISTS ( SELECT 1
FROM table_with_lot_of_rows
WHERE condition_on_index
LIMIT 1000, 1 )
Which returns TRUE/FALSE (which are synonymous with 1 or 0).
Face it, scanning 1001 rows, even of the index, will take some time. I think my formulation is the fastest possible.
Other things to check: Is this InnoDB? Does EXPLAIN say "Using index"? How much RAM? What is the setting of innodb_buffer_pool_size?
Note that InnoDB now has FULLTEXT, so there is no reason to stick with MyISAM.
If you are using MyISAM and the WHERE is MATCH..., then most of what I said is likely not to be applicable. FULLTEXT probably fetches all results before giving the rest of the engine to chance to do these games with ORDER BY and LIMIT.
Please show us the actual query, its EXPLAIN, and SHOW CREATE TABLE. And what is the real goal? To see if a query will deliver "too many" results?
Possible improvement (depending on context)
Since my initial SELECT returns scalar 1 or NULL, it can be used in any boolean context such as WHERE. 1 is TRUE, NULL will be treated as FALSE. Hence EXISTS is probably redundant.
Also, 1/NULL can be turned into 1/0 thus. Note: the extra parens are required.
IFNULL( ( SELECT ... LIMIT 1000,1 ), 0)
You can optimize the query using a sub-query with a LIMIT:
SELECT count(*)>1000 FROM (
SELECT 0 table_with_lot_of_rows
WHERE condition_on_index
LIMIT 1001
) as truncated_count;
In that case, MySQL stops as soon as enough rows satisfy the condition.
[site_list] ~100,000 rows... 10mb in size.
site_id
site_url
site_data_most_recent_record_id
[site_list_data] ~ 15+ million rows and growing... about 600mb in size.
record_id
site_id
site_connect_time
site_speed
date_checked
columns in bold are unique index keys.
I need to return 50 most recently updated sites AND the recent data that goes with it - connect time, speed, date...
This is my query:
SELECT SQL_CALC_FOUND_ROWS
site_list.site_url,
site_list_data.site_connect_time,
site_list_data.site_speed,
site_list_data.date_checked
FROM site_list
LEFT JOIN site_list_data
ON site_list.site_data_most_recent_record_id = site_list_data.record_id
ORDER BY site_data.date_checked DESC
LIMIT 50
Without the ORDER BY and SQL_CALC_FOUND_ROWS(I need it for pagination), the query takes about 1.5 seconds, with those it takes over 2 seconds or more which is not good enough because that particular page where this data will be shown is getting 20K+ pageviews/day and this query is apparently too heavy(server almost dies when I put this live) and too slow.
Experts of mySQL, how would you do this? What if the table got to 100 million records? Caching this huge result into a temp table every 30 seconds is the only other solution I got.
You need to add a heuristic to the query. You need to gate the query to get reasonable performance. It is effectively sorting your site_list_date table by date descending -- the ENTIRE table.
So, if you know that the top 50 will be within the last day or week, add a "and date_checked > <boundary_date>" to the query. Then it should reduce the overall result set first, and THEN sort it.
SQL_CALC_ROWS_FOUND is slow use COUNT instead. Take a look here
A couple of observations.
Both ORDER BY and SQL_CALC_FOUND_ROWS are going to add to the cost of your performance. ORDER BY clauses can potentially be improved with appropriate indexing -- do you have an index on your date_checked column? This could help.
What is your exact need for SQL_CALC_FOUND_ROWS? Consider replacing this with a separate query that uses COUNT instead. This can be vastly better assuming your Query Cache is enabled.
And if you can use COUNT, consider replacing your LEFT JOIN with an INNER JOIN as this will help performance as well.
Good luck.