Please consider a table with queue_name, priority and message_timestamp columns.
I'm going to perform the following query:
SELECT message_timestamp
from queue_messages
WHERE queue_name = 'name'
AND state = 0
ORDER
BY message_timestamp DESC
LIMIT 1
Here is a compound index for that:
CREATE INDEX STATE_QUEUENAME_TIMESTAMP ON `queue_messages` (queue_name, state, message_timestamp);
EXPLAIN shows that index matches the query pretty good (there is no filesort for ORDER BY):
My problem is that without ORDER BY message_timestamp I have throughput ~200 prs for this query, but with it ~50 rps!
And more rows in the table, slower the query with ORDER BY!
What am I doing wrong?
It is true that ORDER BY clause slows down the performance of query as database needs to buffer the intermediary results before giving final output.
Reason: A sort operation can not be performed in a pipe-lined fashion. The input has to be read completely before the output can be produced.
The alternative of ORDER BY clause could be INDEXING. Indexing keeps your data in an orderly fashion and that's how reduces the overheads for sorting in ORDER BY clause.
(In addition to Harshita's answer:)
If you add INDEX(queue_name, state, message_timestamp), then the query will work faster with or without the ORDER BY.
Note that that composite index handles all of the WHERE filtering, and still has the ORDER BY column(s) for handling the ORDER BY and LIMIT.
More
The EXPLAIN shows the use of that index; furthermore it says "Using index". That means that the index is "covering", which means that the query is performed entirely in the index, and does not need to touch the data.
I would expect the EXPLAIN to be the same whether you have the ORDER BY or not. Is it?
What is "prs"? "rps"? Perhaps "requests per second"? Depending on what else is going on and the caching of blocks in the buffer_pool. the variation of 50-200 seems reasonable. Are there multiple threads reaching for the next item? Is there an UPDATE or DELETE that you have not mentioned that 'deletes' the item after it is found? This will have more impact than the SELECT; we should really discuss that at the same time.
If you are using a table as a queue, you will eventually hit a situation where it performs poorly. My mantra on such: "Don't queue it, just do it".
Related
I'm not clear how to search for this, if there's a duplicate feel free to point me to it.
I'm wondering if there's a way to tell mysql that something will be sorted before filtering so that it can perform the filter with a binary search instead of a linear search.
For example, consider a table with columns id, value, and created_at
id is an auto-increment and created_at is a timestamp field with default of CURRENT_TIMESTAMP
then consider the following query:
SELECT *
FROM `table`
WHERE created_at BETWEEN '2022-10-05' AND '2022-10-06'
ORDER BY id
Because I have context on the data that mysql doesn't, namely that if id is sorted then created_at will also be sorted, I can conclude that we can binary search on created_at. However mysql does a full table scan for the filter as it's unaware of, or unwilling to assume this fact. The explain on the query on my test dataset shows that it's scanning all 50 rows to return the 24 that match the filter, when it's possible to do it by only scanning approximately log2(50) rows. This isn't a huge difference for my test dataset but on larger data it can have an impact.
I'll note that the obvious answer here is to add an index on created_at, but on more real life queries that's not always possible. For example if you were filtering on another indexed column it wouldn't be able to use that created_at index, but we might still be able to make assumptions about the ordering based on other order bys.
Anyway, after all that setup my question is: Is there a way that I can tell MySQL that I know that this data is already sorted so that it need not perform a table scan? Something similar to FORCE INDEX that can be used to overwrite the behaviour of picking an index for a query
The answer is no in general.
InnoDB queries read rows in order by the index used. So in your case if there's an index on created_at, it'll read rows in that order, ascending. There's no way to tell the optimizer that this matches the order of id too, and the optimizer won't make that assumption.
So the bottom line is that it'll have to perform a filesort on the matching rows to guarantee they're in order by id.
The comment above suggests ORDER BY created_at would solve the problem in the example you show. That is, if you know that the order of created_at is the same order as id, then just ORDER BY created_at, and the filesort can be skipped. That is, the optimizer knows the ORDER BY you requested is actually the order it read the rows from the index, so sorting is a no-op.
But I assume your example was only one case among many potential cases, and it might not be the right solution to use in other cases.
But why was the query doing a table-scan?
In the example you give of a table-scan of 50 rows, it's possible the optimizer decided not to use the index because the table-scan is so little work that the extra indirection of using the index isn't worth it. This is why we need to fill a table with at least a few hundred rows during testing, before we know if an index improves the query.
Another possible reason for the table-scan is that the range of dates covers a significantly large part of the table. In my experience, if the condition matches over 20% of the table, then the optimizer says, "meh, not worth the effort to use the index, so I'll just do a table-scan." That 20% figure is not an officially documented threshold, it's just my observation.
FORCE INDEX might help the latter case. FORCE INDEX really just tells the optimizer to assume a table-scan is infinitely costly, so if the named index is relevant at all to the search conditions, then use the index instead of a table-scan. It's possible though to use FORCE INDEX and name an index that is irrelevant to the search conditions. In that case, the optimizer still won't use the index, it'll just feel shame over having to do a table-scan.
Because I have context on the data that mysql doesn't, namely that if id is sorted then created_at will also be sorted...
Give MySQL that information: order by created_at, id and make sure created_at is indexed. This will allow MySQL to use the created_at index to filter and also do most of the ordering. If this is still slow, try adding a composite index of (created_at, id).
Also, upgrade MySQL. 5.6 reached the end of its life last year. MySQL 8 is considerably better at optimizing queries.
I'm just wondering, if I do SELECT COUNT(*) FROM ... WHERE ... ORDER BY ..., does MySQL sort the records before counting? Or does it understand that it makes no sense in this case and just ignores ORDER BY?
The ORDER BY statement, is (in this query) the last statement to be executed.
The ORDER BY statement only afects the result, never affects the data you are looking into, so in this case the system will work in this way:
Count the rows that meet your condition (maybe using an index or a full scan, depending on the data and the conditions).
Get the rows for your answer, in this case just 1 row.
Try to order the rows of your answer, in this case with no effect because you have only one row.
So the order by (in this case) will have no effect, because the order applies only to the rows of the answer and you only have 1 row. And as someone said, it doesn´t make any sense to try to order a result with just one row.
I try to answer your question with what I know, but I'm not sure if it will help you, if my answer is wrong, I hope you can point it out.
I am using the InnoDB engine of mysql. InnoDB cannot maintain a row_count variable like MyISAM, so when performing the count(*) operation, a full table scan must be performed, but it does not need to be as system-intensive as the **select *** operation Resources, but you know, after some calls from the relatively top-level sub_select function, all branches will eventually be called into the row_search_mvcc function, which is used to read a row from the B+-tree structure stored by the InnoDB storage engine to memory In one of the buf (uchar * ), it will be used for subsequent processing. Locks, MVCC, etc. may be involved here, but row locks are not involved. This is for reading a row of data in mysql, but it does not need to read a whole row of valid data. At the code level, it will be in evaluate_join_record The lines read are evaluated in the function. So, in my opinion, whether you use order by or not has little effect on count(*).
Responding to your specific question, it is not ignored nor optimized by the DB engine.
Performance is affected by # of rows not the # of keys.
Maybe you can find further info here
ORDER BY Optimization
I am trying to optimize the query which we are using to generate reports.
I think I did quite good to optimize to some extends.
Below was the original query:
select trat.asset_name as group_name,trat.name as sub_group_name,
trat.asset_id as group_id,
if(trat.cause_task_type='AccessRequest',true,false) as is_request_task,
'' as grouped_on,
concat(trat.asset_name,' - {0} (',count(*),')') as table_heading
from t_remote_agent_tasks trat
where trat.status in ('completed','failedredundant')
and trat.name not in ('collect-data','update-conn-params')
group by trat.asset_name, trat.name
order by field(trat.name,'create-account','change-attr',
'add-member-to-group',
'grant-access','disable-account','revoke-access',
'remove-member-from-group','update-license')
When I see the execution plain in Extra column it says using where,Using Temporary,filesort.
So I optimize the query like this
select trat.asset_name as group_name,trat.name as sub_group_name,
trat.asset_id as group_id,
if(trat.cause_task_type='AccessRequest',true,false) as is_request_task,
'' as grouped_on,
concat(trat.asset_name,' - {0} (',count(*),')') as table_heading
from t_remote_agent_tasks trat
where trat.status in ('completed','failedredundant')
and trat.name not in ('collect-data','update-conn-params')
group by trat.asset_name,trat.name
order by null
Which gives me the execution plan as using where,using temporary. So filesort is no more use and there is no extra overhead as optimizer doesn't have to sort,which will be taken care during group by.
I again went forward and created indexes on group by columns in same order as they mentioned in group by(this is important or optimization won't happen) i.e create index on (trat.asset_name,trat.name).
Now this optimization gives me Using where only in extra column. Also the query execution time got deduced by almost half(earlier it was 0.568 sec. and now 0.345sec ,not exact though it vary every time but more or less in this range).
Now I want to optimize the range query ,below part of query
trat.status in ('completed','failedredundant')
and trat.name not in ('collect-data','update-conn-params')
I am reading on mysl reference guide to optimize range query,Which says not in is not in range query ,So I did the modification in query like this
trat.status in ('completed','failedredundant')
and trat.name in ('add-member-to-group','change-attr','create-account',
'disable-account','grant-access', 'remove-member-from-group',
'update-license')
But it doesn't show any improvement in Extra(I mean using index should be there,it is still showing using where).
I also tried by splitting both range part into unions(that will change the query result but still no improvement in execution plan)
I want some help on how to optimize this query more,mostly the range part(in part).
Any other optimization if I need to make on this?
I appreciate your time,Thanks in advance
EDIT 1 I forgot to mentioned that I have index on trat.status also,So Below are the indexes
(trat.asset_name,trat.name)
(trat.status)
In virtually all cases, only one index is used in a SELECT. So, one must have available the best.
Both of the first two queries will probably benefit most from the same 'composite' index:
INDEX(asset_name, name)
Normally, one would try to handle the WHERE conditions in the index, but they do not look amenable to an index. (More discussion below.) Second choice is the GROUP BY, which I am recommending. But, since (in the first case) the ORDER BY and the GROUP BY are different, there will necessarily be a tmp table created for the output of the GROUP BY so that it can be sorted according to the ORDER BY. (There may also be a tmp and sort for the GROUP BY; I cannot tell.)
"Using index" means that a "covering" index was used. A "covering" index is a composite index that includes all of the columns used anywhere in the SELECT. That would be about 5 columns, and probably not wise to attempt. (More below.)
Another thing to note that even something this simple:
WHERE x IN (11,22)
GROUP BY y
cannot use any index to handle both the WHERE and GROUP BY. So, there is no way for your query to consume both (except by 'covering').
A covering index, when used, is only partially useful. It says that all the work is done just in the BTree of the index. But that could include a full index scan -- which is not that much faster than a full table scan. This is another argument against recommending 'covering'.
In some situations, IN or OR can be sped up by turning it into UNION:
( SELECT ... WHERE status in ('completed') )
UNION ALL
( SELECT ... WHERE status in ('failedredundant') )
but this will only cause you to stumble into the NOT IN(...) clause, which is worse than an IN.
The goal of finding the best index is to find one that has the rows (in the index and/or in the table) consecutively sitting in the BTree.
To make any further improvements on this query will probably require re-thinking the schema -- it seems to be forcing you to have IN, NOT IN, FIELD and other hard-to-optimize constructs.
When I add LIMIT 1 to a MySQL query, does it stop the search after it finds 1 result (thus making it faster) or does it still fetch all of the results and truncate at the end?
Depending on the query, adding a limit clause can have a huge effect on performance. If you want only one row (or know for a fact that only one row can satisfy the query), and are not sure about how the internal optimizer will execute it (for example, WHERE clause not hitting an index and so forth), then you should definitely add a LIMIT clause.
As for optimized queries (using indexes on small tables) it probably won't matter much in performance, but again - if you are only interested in one row than add a LIMIT clause regardless.
Limit can affect the performance of the query (see comments and the link below) and it also reduces the result set that is output by MySQL. For a query in which you expect a single result there is benefits.
Moreover, limiting the result set can in fact speed the total query time as transferring large result sets use memory and potentially create temporary tables on disk. I mention this as I recently saw a application that did not use limit kill a server due to huge result sets and with limit in place the resource utilization dropped tremendously.
Check this page for more specifics: MySQL Documentation: LIMIT Optimization
The answer, in short, is yes. If you limit your result to 1, then even if you are "expecting" one result, the query will be faster because your database wont look through all your records. It will simply stop once it finds a record that matches your query.
If there is only 1 result coming back, then no, LIMIT will not make it any faster. If there are a lot of results, and you only need the first result, and there is no GROUP or ORDER by statements then LIMIT will make it faster.
If you really only expect one single result, it really makes sense to append the LIMIT to your query. I don't know the inner workings of MySQL, but I'm sure it won't gather a result set of 100'000+ records just to truncate it back to 1 at the end..
How is it possible to have a good plan in EXPLAIN like below and have a slow query. With few rows, using index, no filesort.
The query is running in 9s. The main table has around 500k rows.
When I had 250k rows in that table, the query was running in < 1s.
Suggestions plz?
Query (1. fields commented can be enabled according user choice. 2. Without FORCE INDEX I got 14s. 3. SQL_NO_CACHE I use to prevent false results):
SELECT SQL_NO_CACHE
p.property_id
, lct.loc_city_name_pt
, lc.loc_community_name_pt
, lc.loc_community_image_num_default
, lc.loc_community_gmap_longitude
, lc.loc_community_gmap_latitude
FROM property as p FORCE INDEX (ix_order_by_perf)
INNER JOIN loc_community lc
ON lc.loc_community_id = p.property_loc_community_id
INNER JOIN loc_city lct FORCE INDEX (loc_city_id)
ON lct.loc_city_id = lc.loc_community_loc_city_id
INNER JOIN property_attribute pa
ON pa.property_attribute_property_id = p.property_id
WHERE p.property_published = 1
AND (p.property_property_type_id = '1' AND p.property_property_type_sale_id = '1')
AND p.property_property_housing_id = '1'
-- AND p.property_loc_community_id = '36'
-- AND p.property_bedroom_id = '2'
-- AND p.property_price >= '50000' AND p.property_price <= '150000'
-- AND lct.loc_city_id = '1'
-- AND p.property_loc_subcommunity_id IN(7,8,12)
ORDER BY
p.property_featured DESC
, p.property_ranking_date DESC
, p.property_ranking_total DESC
LIMIT 0, 15
Query Profile
The resultset always outputs 15 rows. But the table property and property_attribute has around 500k rows.
Thanks all,
Armando Miani
This really seems to be an odditity in EXPLAIN in this case. This doesn't occur on MySQL 4.x, but it does on MySQL 5.x.
What MySQL is really showing you is that MySQL is trying to use the forced index ix_order_by_perf for sorting the rows, and it's showing you 15 rows because you have LIMIT 15.
However, the WHERE clause is still scanning all 500K rows since it can't utilize an index for the criteria in your WHERE clause. If it were able to use the index for finding the required rows, you would see the forced index listed in the 'possible_keys' field.
You can prove this by keeping the FORCE INDEX clause and removing the ORDER BY clause. You'll see that MySQL now won't use any indexes, even the one you're forcing (because the index doesn't work for this purpose).
Try adding property_property_type_id, property_property_type_sale_id, property_property_housing_id, and any other columns that you refer to in your WHERE clause to the beginning of the index.
There's a moment when your query will be optimized around a model which might not be anymore valid for a given need.
A plan could be great but, even if the filters you are using in the where clause respect indexes definitions, it doesn't mean the parser doesn't parse may rows.
What you have to analize is how determinating are your indexes. For instance, if there's an index on "name, family name" in a "person" table, the performances are going to be poor if everybody has the same name and family name. The index is a real trap pulling down performances when it doesn't manage to be enough describing a certain segment of your datasets.
Based on the output of your explain the query, here are my initial thoughts:
This portion of your query (rewritten to excluded the unneeded parentheses):
p.property_published = 1
AND p.property_property_type_id = '1'
AND p.property_property_type_sale_id = '1'
AND p.property_property_housing_id = '1'
Put conditions so many conditions on the property table that it's unlikely any index you have can be used. Unless you have a single index that has all four of those attributes in it, you're forcing a full table scan on the query just to find the rows that meet those conditions (though, it's possible if you have an index on one of the attributes it could use that).
First, I'd add the following index (have not checked this for syntax errors):
CREATE INDEX property_published_type_sale_housing_idx
ON property (property_published,
property_property_type_id,
property_property_type_sale_id,
property_property_housing_id );
Then I'd re-run your EXPLAIN to see if you hit the index now. (Take off the FORCE INDEX on that part of the query).
Also, even given this issue, it's possible the slow down may be memory related. That is, you may have enough memory to process the table with a smaller number of rows, but it may be that when the table gets larger MySQL can't process the entire query in memory and is forced to start using disk to get the entire query handled. This would explain why there's a sudden drop off in performance.
If that's the case, then two things might help:
Adding more memory (and tune the mysql config file to take advantage of it) so that the number of rows that can br processed at once is larger. This is at best a temporary solution.
Tune the indexes (like I'm saying above) so that the number of rows that mysql needs to process is lower. If it can be more precise in picking the rows it selects for processing.
except a good plan you need to have enough resources to run query.
check buffers size and another critical parameters in your config.
And your query is?