I have a query which has 7-8 table join, and major 4 tables has approximately 200000 records. this query needs conditional order by clause as per result. this query returns results in 1.6 seconds. I think this is very slow, I expect result should be come within 200ms. So I have checked execution plan using EXPLAIN command, and it's shows me in first line that 49150 rows scanned Using where; Using temporary and Using filesort.
Now I have put this query as subquery (like "SELECT * FROM ( My Actual Query ) a ORDER BY a.field ) and then check with EXPLAIN command. so this time it shows me 5454396 rows scanned using filesort. and my result come within 1 second. I don't understand which is better way to achive performance, because this plan scanned much more rows compare to first query plan.
If first query is right for me, then how do I avoid "Using temporary" and keep order by as I needed?
EDIT :
I have updated my schema and query with execution plan here
Related
we switched our database from mySQL8 to MariaDB10 a week ago and now we have massive performance problems. We figured out why: we are working with subqueries in select statements and ORDER BY pretty often. Here is an example:
SELECT id, (SELECT id2 FROM table2 INNER JOIN [...] WHERE column.foreignkey = table.id) queryResult
FROM table
WHERE status = 5
ORDER BY column
LIMIT 10
imagine, there are 1.000.000 entries in table which are affected if status = 5.
What happens in mySQL8: ORDER BY and LIMIT execute and after that the subquery (10 rows affected)
What happens in MariaDB10: the subquery executes (1.000.000 rows affected) and after that ORDER BY and LIMIT
Both queries are returning 10 rows but under MariaDB10 it is incredible slow because of that. Why is this happing? And is there an option in MariaDB which we should activate to avoid this? I know from mySQL8 that select subqueries will be executed when they are mentioned in ORDER BY. But if not they will be executed when the resultset is there.
Info: if we do this, everything is fine:
SELECT *, (SELECT id2 FROM table2 INNER JOIN [...] WHERE column.foreignkey = outerTable.id)
FROM (
SELECT id
FROM table
WHERE status = 5
ORDER BY column
LIMIT 10
) outerTable
Thank you so much for any help.
This is because table a by nature unsorted bunch of rows
A "table" (and subquery in the FROM clause too) is - according to the SQL standard - an unordered set of rows. Rows in a table (or in a subquery in the FROM clause) do not come in any specific order. That's why the optimizer can ignore the ORDER BY clause that you have specified. In fact, the SQL standard does not even allow the ORDER BY clause to appear in this subquery (we allow it, because ORDER BY ... LIMIT ... changes the result, the set of rows, not only their order).
mariadb manual
So the optimizer removes and ignores the ORDER BY.
You found already a method to circumvent it using LIMIT and ORDER By in the subquery
After searching and searching I finally found a solution to make the mariaDB10 database working as I knew it from mySQL8.
For those which have similar problems: set this each time you connect to the server and everything works like in mySQL8:
SET optimizer_use_condition_selectivity = 1
Long version: the problem I described at the top was suddenly solved and the subquery was executed like it was in the past under mySQL8. I did exactly nothing!
But there were soon new problems: we have a statistic page, which was incredible slow. I noticed that an index was missing and I add it. I executed the query and it was working. Without index 100.000 rows affected for finding the results, after adding 38. Well done.
Then strange things started to happen: I executed the query again and the database didn't use the index. So I executed it again and again. This was the result:
1st query execution (I did it with ANALYZE): 100.000 rows affected
2nd query execution: 38 rows affected
3rd query execution: 38 rows affected
4th query execution: 100.000 rows affected
5th query execution: 100.000 rows affected
It was complete random, even in our SaaS solution! So I startet to search how the optimizer determine an execution plan. I found this: optimizer_use_condition_selectivity
the default for mariaDB10.4 server is 4 which means, that histograms are used to calculate the result set. I saw a few videos about it and recognized that this will not work in our case (although we stuck to database normalization). Mode 1 works well:
use selectivity of index backed range conditions to calculate the cardinality of a partial join if the last joined table is accessed by full table scan or an index scan
I hope this will help some other guys which despair with this like I did.
At 5.6, MariaDB and MySQL went off in different directions for the Optimizer. MariaDB focused a lot on subqueries, though perhaps to the detriment of this particular query.
Do you have INDEX(status, column)? It would help most variants of this query.
Yes, the subquery has to be evaluated for every row before the order by. The subquery only seems to need id, so you can phrase this as:
SELECT id,
(SELECT id2 FROM table2 INNER JOIN [...] WHERE column.foreignkey = t.id) as queryResult
FROM (SELECT t.*
FROM table t
WHERE status = 5
ORDER BY column
LIMIT 10
) t
This evaluates the subquery only after the rows have been selected from the table.
Given the following two queries:
Query #1
SELECT log.id
FROM log
WHERE user_id IN
(188858, 188886, 189854, 203623, 204072)
and type in (14, 15, 17)
ORDER BY log.id DESC
LIMIT 25 OFFSET 0;
Query #2 - 4 IDs instead 5
SELECT log.id
FROM log
WHERE user_id IN
(188858, 188886, 189854, 203623)
and type in (14, 15, 17)
ORDER BY log.id DESC
LIMIT 25 OFFSET 0;
Explain Plan
-- Query #1
1 SIMPLE log range idx_user_id_and_log_id idx_user_id_and_log_id 4 41280 Using index condition; Using where; Using filesort
-- Query #2
1 SIMPLE log index idx_user_id_and_log_id PRIMARY 4 53534 Using where
Why the addition of a single ID makes the execution plan so different? I'm talking about a difference in time of milliseconds to ~1 minute. I thought that it could be related to the eq_range_index_dive_limit parameters, but it's bellow 10 anyway (the default). I know that I can force the usage of the index instead of the clustered index, but I wanted to know why MySQL decided that.
Should I try to understand that? Or sometimes it's not possible to understand query planner decisions?
Extra Details
Table Size: 11GB
Rows: 108 Million
MySQL: 5.6.7
Doesn't matter which ID is removed from the IN clause.
The index: idx_user_id_and_log_id(user_id, id)
As you have shown, MySQL has two alternative query plans for queries with ORDER BY ... LIMIT n:
Read all qualifying rows, sort them, and pick the n top rows.
Read the rows in sorted order and stop when n qualifying rows have been found.
In order to decide which is the better option, the optimizer needs to estimate the filtering effect of your WHERE condition. This is not straight-forward, especially for columns that are not indexed, or for columns where values are correlated. In your case, one probably has to read a lot more of the table in sorted order in order to find the first 25 qualifying rows than what the optimizer expected.
There have been several improvements in how LIMIT queries are handled, both in later releases of 5.6 (you are running on a pre-GA release!), and in newer releases (5.7, 8.0). I suggest you try to upgrade to a later release, and see if this still is an issue.
In general, if you want to understand query planner decisions, you should look at the optimizer trace for the query.
JOIN is much more efficient.
Create a temporary table with the values of the IN operator.
Then make a JOIN between table 'log' to the temporary table of values.
Refer to this answer
for more info.
Add
INDEX(user_id, type, id),
INDEX(type, user_id, id)
Each of these is a "covering" index. As such, the entire query can be performed by looking only in one index, without touching the 'data'.
I have two choices for the Optimizer -- hopefully it will be able to pick whether user_id IN (...) is more selective or type IN (...) in order to pick the better index.
If, after adding those, you don't have any use for idx_user_id_and_log_id(user_id, id), DROP it.
(No, I can't explain why query 2 chose to do a table scan.)
When I execute the query below in MariaDB 10.1/MySQL 5.7 the results have 849 row and execute in 0,016 seconds.
SELECT a.*
FROM bco_dav_dm.d_ttlz_cli a
WHERE FL_ATND_FNAL = 0
AND a.ID_ATND = 218
ORDER BY A.FL_CRTR, A.DT_PRMR_ATVC_LNHA;
But when I add LIMIT clause to return only 1 row the query execute in 9 seconds!!! Why ?
I tested:
Using LIMIT 1 the query execute in 9 seconds.
Using LIMIT 2 the query execute in 9 seconds.
Using LIMIT 3 the query execute in 9 seconds.
Using LIMIT 4 and above (5,6,7,8...100, 200, 300, 400) the query execute in 0,016 seconds!!!
I've tested several times and I always have the same result.
How I will use this query in Web App, and I need only 1 record I don't know why LIMIT <=3 slow down the query!
In other posts, talks that using higher OFFSET naturally slow down the query, but I don't use OFFSET.
My Explain:
select_type: SIMPLE
table:a
type:ref
possible_keys:ID_ATND,FL_ATND_FNAL
key:ID_ATND
key_len:5
ref:const
rows:1846
Extra: Using where; Using filesort
EDITED:
I noticed when I use LIMIT 3 or below my explain change
select_type: SIMPLE
table:a
type:ref
possible_keys:ID_ATND,FL_ATND_FNAL
key:ORDER_BY_CRTR_DT
key_len:6
ref:const
rows:1764
Extra: Using where
The Index ORDER_BY_CRTR_DT is a composed Index that I use in my ORDER BY
INDEX ORDER_BY_CRTR_DT(FL_CRTR, DT_PRMR_ATVC_LNHA);
The cost based optimizer views the situation a bit different with different limits and just gets is plain wrong in this case. You will see this kind of weird behavior more often and in pretty much all cost based databases.
The place in this case where you see the difference is in the chosen index ORDER_BY_CRTR_DT and ID_ATND in the other plan, which the database then uses to estimate the number of rows. Seeing a lower number of rows, the cost based optimizer assumes the query is faster (simplified view point).
What sometimes can help is rebuilding the table and the indexes so that they all have the most recent information about the data in the histograms describing the data. Over time the result might change again due to inserts, updates and deletes, with as result that the plan can degrade again. Stabilization of the plan is however often reached by having a regular rebuild.
Alternative you can force the index to be used, with as result that the cost based optimizer is disabled for this plan. This can however backfire in the same way the cost based optimizer fails you right now.
Second alternative is to drop the index which gives you the 9s result, which might be an option if it is not used or has little impact on other queries.
I have a query that is returning 17,000 rows on a table called 'history'. It has about 15 left joins to other tables mainly based on history.id = othertable.primary_id.
When I remove order by history.primary_id DESC the query is very fast. When order by history.primary_id DESC is there MySQL takes about 26 seconds to write to a temp table..so very slow.
However when "order by" is present, but the joins are omitted, it is quick.
I do not understand why the data being written to the temp table is dependent on the joins? Wouldn't MySQL already know what the 17,000 rows consist of by that point as it is after the query executes?
According to "EXPLAIN" my indexes are set up pretty well, but for the life of me I cannot sort this query. I want to use a limit of 1000 but it is useless when I cannot sort. I cant seem to get it to not use a temp table.
I have a hug table with 170,000 records.
What is difference between this query
Showing rows 0 - 299 (1,422 total, Query took 1.9008 sec)
SELECT 1 FROM `p_apartmentbuy` p
where
p.price between 500000000 and 900000000
and p.yard = 1
and p.dateadd between 1290000000 and 1320000000
ORDER BY `p`.`id` desc
limit 1669
Explain
And this one:
Showing rows 0 - 299 (1,422 total, Query took 0.2625 sec)
SELECT 1 FROM `p_apartmentbuy` p
where
p.price between 500000000 and 900000000
and p.yard = 1
and p.dateadd between 1290000000 and 1320000000
ORDER BY `p`.`id` desc
limit 1670
Explain:
Both of these queries are using 1 table with same data and have same where clasue, but only limit row count are different
MySQL has a buffer for sorting. When the stuff to be sorted is too big, it sorts chunks, then mergesorts them. This is called "filesort". Your 1670-th row apparently just overflows the sort buffer.
Read more details here.
Now why it picks another key for the in-memory sort... I am not too sure; but apparently its strategy is not quite good since it ends up being slower.
recap: odd that the query returning more rows runs much faster
this is not related to buffer vs file sort, sorting 1400 records takes well under 1 second
the first explain shows the query optimizer doing a linear scan, the second explain shows it using an index. Even a partially helpful index is usually much better than none at all.
Internally, mysql maintains stats about the size of indexes and tries to guess which index, or whether a linear scan would be faster. This estimate is data specific, I've seen mysql use the right index 99 times out of 100, but every now and then pick a different one and run the query 50x slower.
You can override the built-in query optimizer and specify the index to use manually, with SELECT ... FROM ... FORCE INDEX (...)