How to remove this filesort? - mysql

I have problem, my slow query is still using filesort.
I can not get rid of this extra flag. Could you please help me?
My query looks like this:
select `User`,
`Friend`
from `friends`
WHERE
(`User`='3053741' || `Friend`='3053741')
AND `Status`='1'
ORDER by `Recent` DESC;
Explain says this:
http://blindr.eu/stack/1.jpg
My table structure and indexed (it is in slovak language, but it should be the same):
http://blindr.eu/stack/2.jpg
A little help is needed, how to get rid of this filesort?

USING FILESORT does not mean, that MySQL is actually using a file. The naming is a bit "unlucky". The sorting is done in memory, as long as the data fits into memory. That said, have a look at the variable sort_buffer_size. But keep in mind, that this is a per session variable. Every connection to your server allocates this memory.
The other option is of course, to do the sorting on application level. You have no LIMIT clause anyway.
Oh, and you have too many indexes. Combined indexes can also be used when just the left-most column is used. So some of your indexes are redundant.
If performance is really that much of an issue, try another approach, cause the OR in the WHERE makes it hard, if not impossible, to get rid of using filesort via indexes. Try this query:
SELECT * FROM (
select "this is a user" AS friend_or_user, `User`, Recent
from `friends`
WHERE
`User`='3053741'
AND `Status`='1'
UNION ALL
select "this is a friend",
`Friend`, Recent
from `friends`
WHERE
`Friend`='3053741'
AND `Status`='1'
) here_needs_to_be_an_alias
ORDER by `Recent` DESC

Related

Should i rather use a subquery or a combined WHERE?

This specific situation may seem a bit silly, but i just want to know how i should solve it: there is a table (schools) and in this table you find all students with their school-id. The order is completely random, but with a SELECT statement you can sort it.
CREATE TABLE schools (school_id int, name varchar(32), age ...);
Now i want to search for a student by his name (with LIKE '%name%'), but only if he's in a certain school.
I already tried this:
SELECT * FROM `schools` WHERE `school_id` = 33 and `name` LIKE '%max%';
But then i realized, that i could also use subqueries like:
SELECT * FROM (SELECT * FROM `schools` WHERE `school_id` = 33) AS a
WHERE a.name LIKE '%max%';
Which way is more efficient/has a higher performance?
You can use the EXPLAIN keyword to see exactly how each query is executed.
I'd say it's almost a definite that these two will execute identically.
The query optimizer will probably choose the same plan for both queries. If you want to know for sure, look at the execution plan when you execute each query.
The query without the subquery is probably more efficient in MySQL:
SELECT *
FROM `schools`
WHERE `school_id` = 33 and `name` LIKE '%max%';
MySQL has this nasty tendency to materialize subqueries -- that is, to actually run the subquery and save it as a temporary table (it is getting better, though). Most other databases do not do this. So, in other databases, the two should be equivalent.
MySQL is smart enough to use an index, if available, for school_id, even though there are other comparisons. If no indexes are available, it will be doing a full table scan, which will probably dominate the performance.

MySQL: where exists VS where id in [performance]

This question also exist here: Poor whereHas performance in Laravel
... but without answer.
A similar situation happened to me as it happened to the author of that question:
replays table has 4M rows
players table has 40M rows
This query uses where exists and it takes a lot of time (70s) to finish:
select * from `replays`
where exists (
select * from `players`
where `replays`.`id` = `players`.`replay_id`
and `battletag_name` = 'test')
order by `id` asc
limit 100;
but when it's changed to use where id in instead of where exists - it's much faster (0.4s):
select * from `replays`
where id in (
select replay_id from `players`
where `battletag_name` = 'test')
order by `id` asc
limit 100;
MySQL (InnoDB) is being used.
I would like to understand why there is such a big difference in performance between where exists VS where id in - is it because of the way how MySQL works? I expected that the "exists" variant would be faster because MySQL would just check whether relevant rows exist... but I was wrong (I probably don't understand how "exists" works in this case).
You should show the execution plans.
To optimize the exists, you want an index on players(replay_id, battletag_name). An index on replays(id) should also help -- but if id is a primary key there is already an index.
Gordon has a good answer. The fact is that performance depends on a lot of different factors including database design/schema and volume of data.
As a rough guide, the exists sub-query is going to execute once for every row in replays and the in sub-query is going to execute once to get the results of the sub-query and then those results will be searched for every row in replays.
So with the exists, the better the indexing/access path the faster it will run. Without relevant index(es) it will just read through all rows until it finds a match. For every single row in replays. For the rows with no matches it would end up reading the entire players table each time. Even the rows with matches could read through a significant number of players before finding a match.
With the in the smaller the resultset from the sub-query the faster it will run. For those without a match it only needs to quickly check the small sub query rows to reach that answer. That said you don't get the benefit of indexes (if it works this way) so for a large result set from the sub query it has to read every row in the sub select before deciding that when there is no match.
That said, database optimisers are pretty clever, and don't always evaluate queries exactly the way you ask them to, hence why checking execution plans and testing yourself is important to figure out the best approach. Its not unusual to expect a certain execution path only to find that optimiser has chosen a different method of execution based on how it expects the data to look.

MySQL order by makes query slow solved, but not sure why

I have the following query:
select *
from
`twitter_posts`
where
`main_handle_id` in (
select
`twitter`.`main_handle_id`
from
`users`
inner join `twitter` on `twitter`.`user_id` = `user`.`id`
where
`users` LIKE 'foo'
)
order by created_at
This query runs dramatically slow when the order by is used. The twitter_posts table has indexes on the created_at timestamp and on the main_handle_id column.
I used "explain" to see what the engine was planning on doing and noticed a filesort... which I was not expecting to see as an index exists on the twitter_posts table. After skimming through the various posts covering this on Stackoverflow and blogs, none of the examples worked for me. I had a hunch, that maybe, the sql optimizer got confused somehow with the nested select. So I wrapped the query and ordered the result of that by created_at:
select * (select *
from
`twitter_posts`
where
`main_handle_id` in (
select
`twitter`.`main_handle_id`
from
`users`
inner join `twitter` on `twitter`.`user_id` = `user`.`id`
where
`users` LIKE 'foo'
)
)
order by created_at
Using explain on this one, DOES use the index to sort and returns the response in a fraction of what it takes with filesort. So I've "solved" my problem, but I don't understand why SQL would make a different choice based on the solution. As far as I can see, I'm just wrapping the result and ask everything of that result in a seemingly redundant statement...
Edit: I'm still unaware of why the optimizer started using the index on the created_at when using a redundant subquery, but in some cases it wasn't faster. I've now gone with the solution to add the statement "FORCE INDEX FOR ORDER BY created_at_index_name".
Don't use IN ( SELECT ... ); convert that to a JOIN. Or maybe converting to EXISTS ( SELECT ... ) may optimize better.
Please provide the EXPLAINs. Even if they are both the same. Provide EXPLAIN FORMAT=JSON SELECT ... if your version has such.
If the Optimizer is confused by a nested query, why add another nesting??
Which table is users in?
You mention user.id, yet there is no user table in FROM or JOIN.
What version of MySQL (or MariaDB) are you using? 5.6 and especially 5.7 have significant differences in the optimizer, with respect to MariaDB.
Please provide SHOW CREATE TABLE for each table mentioned in the query.
There are hints in your comments that the queries you presented are not the ones giving you trouble. Please do not ask about a query that does not itself demonstrate the problem.
Oh, is it users.name? Need to see the CREATE TABLEs to see if you have a suitable index.
Since MariaDB was mentioned, I am adding that tag. (This is probably the only way to get someone with specific MariaDB expertise involved.)

Removing 'using filesort' from query

I have the following query:
SELECT *
FROM shop_user_member_spots
WHERE delete_flag = 0
ORDER BY date_spotted desc
LIMIT 10
When run, it takes a few minutes. The table is around 2.5 million rows.
Here is the table (not designed by me but I am able to make some changes):
And finally, here are the indexes (again, not made by me!):
I've been attempting to get this query running fast for hours now, to no avail.
Here is the output of EXPLAIN:
Any help / articles are much appreciated.
Based on your query, it seems the index you would want would be on (delete_flag, date_spotted). You have an index that has the two columns, but the id column is in between them, which would make the index unhelpful in sorting based on date_spotted. Now whether mysql will use the index based on Zohaib's answer I can't say (sorry, I work most often with SQL Server).
The problem that I see in the explain plan is that the index on spotted date is not being used, insted filesort mechanism is being used to sort (as far as index on delete flag is concerned, we actually gain performance benefit of index if the column on which index is being created contains unique values)
the mysql documentation says
Index will not used for order by clause if
The key used to fetch the rows is not the same as the one used in the ORDER BY:
SELECT * FROM t1 WHERE key2=constant ORDER BY key1;
http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html
I guess same is the case here. Although you can try using Force Index

SQL optimisation

I have this query on my database, it's basically pulling the ones that is within 10 miles radius from the coordinate.
SELECT id
FROM business
WHERE ( coordinates!=''
AND getDistance('-2.1032155,49.1801863', coordinates)<10 )
AND id NOT IN ('6', '4118') ORDER BY rand() LIMIT 0,5
when I profile this query, I get this:
JOIN SIZE: 3956 (BAD)
USING TEMPORARY (BAD)
USINGI FILESORT (BAD)
Can you guys help me optimising this query?
Thanks in advance
You will always need FILESORT (sorting the resulting data in temporary memory) as you "order by" something that is not indexed (rand()).
Depending on the optimizing capabilities of your DBMS, you might be better of using "AND id != '6' AND id != '4118'" instead of a "NOT IN" clause.
You should always state the fixed information first in queries, although this depends on the capabilities of the query optimizer as well. Also, the criteria should align with an index, meaning that the order of appearance of criteria should be the same as in the index you want your DMBS to use. There's usually an option to state which index to use (the keyword is "Index hint"), but most DMBS know best what index to use in most cases anway (guessing from the query itself).
And you will never get around USING TEMPORARY while you are using a criterium that is generated at query runtime (your function).