I'm trying to speed up a query which I currently have as:
SELECT *
FROM `events`
WHERE (field1 = 'some string' or field1 = 'some string')
and is_current = true
GROUP BY event_id
ORDER BY pub_date
this takes roughly 30seconds.
field1 is a varchar(150)
I'm currently indexing
field1, is_current, event_id, pub_data
charity, pub_date, is_current
and all the fields individually...
I'm really not sure what fields should be indexed together, when I remove the order by, the query speeds up to around 8 seconds, and if I removed both the order by and group by, it's less than 1 second...
What exactly should be indexed in this case to speed up the query?
Edit:
I've run explain on the modified query (which no longer includes the group by):
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE events range is_current,field1_2,field1_3,field1_4,field1 field1_3 153 NULL 204336 Using where; Using filesort
Which indicates it's using the key field1_3 which is: field1 & is_current
Although it's not using the key which includes those two fields and pub_date (for the ordering..?)
It's also using FILESORT which seems to be the main problem..
any ideas why it's using a filesort even though the pub_date field is also indexed (with the other fields)?
Everything, (field1, is_current, event_id, pub_date) in one index. MySQL can only use one index per joined table in a query.
Use EXPLAIN to see what happens when you do.
Also, an aside - as KoolKabin says, * is rarely a good idea. Sometimes MySQL will copy the rows in a temporary table; and then there are the communication costs. The less you ask from it the faster things will work.
UPDATE: I was actually wrong. Sorry. First off, you can't get full use of indexing if your grouping is different than your ordering. Second, do you have an index where your ordering key (pub_date) is first? If not, try if that fixes the ordering thing.
any ideas why it's using a filesort even though the pub_date field is also indexed (with the other fields)?
This is because the mysql optimizer is trying to use index "field1" and you want the data ordered by pub_date. If you are using mysql 5.1 (the following query will give error in earlier versionn), you can force mysql to use the pub_date index for order by, something like this
SELECT *
FROM `events`
force index for order by (pub_date)
WHERE (field1 = 'some string' or field1 = 'some string')
and is_current = true
GROUP BY event_id
ORDER BY pub_date
Related
I'm encountering a problem with the following query:
select *
from testtable
where user_id = 1 and color = 1 and size = 1
order by created_at desc, id desc;
I was using two index
index1 (user_id, color, size)
index2 (created_at, id)
But i got "using where, using filesort" in the explain result
Then i changed the index to use all the five columns
index1 (user_id ... id)
And the "using filesort" gone, but i still get "using where"
What further steps should i take in order to completely use the indexes for this query?
Thank you
The first index is used for the where. If you also want to use the index for sorting, then you need an index on (user_id, color, size, created_at, id).
Unless you specifically intend to fetch or examine all rows from the table, you may have something wrong in your query if the Extra value is not Using where
From the MySQL docs
So in other word this is absolutely normal and OK.
I am running a basic select on a table with 189,000 records. The table structure is:
items
id - primary key
ad_count - int, indexed
company_id - varchar, indexed
timestamps
the select query is:
select *
from `items`
where `company_id` is not null
and `ad_count` <= 100
order by `ad_count` desc, `items`.`id` asc
limit 50
On my production servers, just the MySQL portion of the execution takes 300 - 400ms
If I run an explain, I get:
select type: SIMPLE
table: items
type: range
possible_keys: items_company_id_index,items_ad_count_index
key: items_company_id_index
key_len: 403
ref: NULL
rows: 94735
Extra: Using index condition; Using where; Using filesort
When fetching this data in our application, we paginate it groups of 50, but the above query is "the first page"
I'm not too familiar with dissecting explain queries. Is there something I'm missing here?
An ORDER BY clause with different sorting order can cause the creation of temporary tables and filesort. MySQL below (and including) v5.7 doesn't handle such scenarios well at all, and there is actually no point in indexing the fields in the ORDER BY clause, as MySQL's optimizer will never use them.
Therefore, if the application's requirements allow, it's best to use the same order for all columns in the ORDER BY clause.
So in this case:
order by `ad_count` desc, `items`.`id` asc
Will become:
order by `ad_count` desc, `items`.`id` desc
P.S, as a small tip to read more about - it seems that MySQL 8.0 is going to change things and these use cases might perform significantly better when it's released.
Try replacing items_company_id_index with a multi-column index on (company_id, ad_count).
DROP INDEX items_company_id_index ON items;
CREATE INDEX items_company_id_ad_count_index ON items (company_id, ad_count);
This will allow it to use the index to test both conditions in the WHERE clause. Currently, it's using the index just to find non-null company_id, and then doing a full scan of those records to test ad_count. If most records have non-null company_id, it's scanning most of the table.
You don't need to retain the old index on just the company_id column, because a multi-column index is also an index on any prefix columns, because of the way B-trees work.
I could be wrong here (depending on your sql version this could be faster) but try a Inner Join with your company table.
Like:
Select *
From items
INNER JOIN companies ON companies.id = items.company_id
and items.ad_count <= 100
LIMIT 50;
because of your high indexcount building the btrees will slow down the database each time a new entry is inserted. Maybe remove the index of ad_count?! (this depends on how often you use that entry for queries)
I need some help in avoiding filesort for this query.
SELECT id
FROM articles USE INDEX(group)
WHERE type = '4'
AND category = '161'
AND did < '10016'
AND id < '9869788'
ORDER BY id DESC
LIMIT 10
INDEX(group) is a covering index of (type, category, did, id)
Because of ORDER BY id DESC, filesort is performed. Is there a way to avoid filesort for such query?
Change the index column order. The index is useless for the sort because it'^s the 4th column and isn't ready to be used as-is.
Of course, this affects the usefulness of the index for this WHERE because you need an inequality column before an equality one
In the MySQL docs, you break "You use ORDER BY on nonconsecutive parts of a key" and "The key used to fetch the rows is not the same as the one used in the ORDER BY"
Edit: as per my link above, you can't have an index that satisfies both WHERE and ORDER BY. They are mutually exclusive because of the 2 conditions I posted above
Another suggestion:
a single column index on id
go back to the original index too
remove the index hint
hope that the optimiser works out that both indexes can be used ("index intersection")
Create index on id and type it should work for you.
If Id is unique you can also create the Primary Key for it.
your query is using file sort because it might be possible that the optimizer is using the first column of index you created in the query.
Let's say we have a common join like the one below:
EXPLAIN SELECT *
FROM visited_links vl
JOIN device_tracker dt ON ( dt.Client_id = vl.Client_id
AND dt.Device_id = vl.Device_id )
GROUP BY dt.id
if we execute an explain, it says:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE vl index NULL vl_id 273 NULL 1977 Using index; Using temporary; Using filesort
1 SIMPLE dt ref Device_id,Device_id_2 Device_id 257 datumprotect.vl.device_id 4 Using where
I know that sometimes it's difficult to choose the right indexes when you are using group by but, what indexes could I set for avoiding 'using temporary, using filesort' in this query? why is this happening? and specially, why this happens after using an index?
One point to mention is that the fields returned by the select (* in this case) should either be in the GROUP BY clause or be using agregate functions such as SUM() or MAX(). Otherwise unexpected results can occur. This is because if the database is not told how to choose fields that are not in the group by clause you may get any member of the group, pretty much at random.
The way I look at it is to break the query down into bits.
you have a join on (dt.Client_id = vl.Client_id and dt.Device_id = vl.Device_id) so all of those fields should be indexed in their respective tables.
You are using GROUP BY dt.id so you need an index that includes dt.id
BUT...
an index on (dt.client_id,dt.device_id,dt.id) will not work for the GROUP BY
and
an index on (dt.id, dt.client_id, dt.device_id) will not work for the join.
Sometimes you end up with a query which just can't use an index.
See also:
http://ntsrikanth.blogspot.com/2007/11/sql-query-order-of-execution.html
You didn't post your indices, but first of all, you'll want to have an index for (client_id, device_id) on visited_links, and (client_id, device_id, id) on device_tracker to make sure that query is fully indexed.
From page 191 of the excellent High Performance MySQL, 2nd Ed.:
MySQL has two kinds of GROUP BY strategies when it can't use an index: it can use a temporary table or a filesort to perform the grouping. Either one can be more efficient depending on the query. You can force the optimizer to choose one method or the other with the SQL_BIG_RESULT and SQL_SMALL_RESULT optimizer hints.
In your case, I think the issue stems from joining on multiple columns and using GROUP BY together, even after the suggested indices are in place. If you remove either (a) one of the join conditions or (b) the GROUP BY, this shouldn't need a filesort.
However, keep in mind that a filesort doesn't always use actual files, it can also happen entirely within a memory buffer if the result set is small enough, so the performance penalty may be minimal. Consider the wall-clock time for the query too.
HTH!
How is
SELECT t.id
FROM table t
JOIN (SELECT(FLOOR(max(id) * rand())) AS maxid FROM table)
AS tt
ON t.id >= tt.maxid
LIMIT 1
faster than
SELECT * FROM `table` ORDER BY RAND() LIMIT 1
I am actually having trouble understanding the first. Maybe if I knew why one is faster than the other I would have a better understanding.
*original post # Difficult MySQL self-join please explain
You can use EXPLAIN on the queries, but basically:
In the first you're getting a random number (which isn't very slow), based on the maximum of a (i presume) indexed field. This is quite quick, i'd say maybe even near-constant time (depends on the implementation of the index hash?)
Then you're joining on that number and returning only the first row that's greater then, and because you're using an index again, this is lightning quick.
The second is ordering by some random function. This has to, but you'll need to look at the explain for that, do a FULL TABLE scan, and then return the first. This is ofcourse VERY expensive. You're not using any indexes because of that rand.
(the explain will look like this, showing that you're not using keys)
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table ALL NULL NULL NULL NULL 14 Using temporary; Using filesort