Why does order by primary index make this query slow? - mysql

This query is getting the newest videos uploaded by the user's subscriptions, its running very slow so I rewrote it to use joins but It didn't make a difference and after tinkering with it I found out that removing ORDER BY would make it run fast (however it defeats the purpose of the query).
Query:
SELECT vid. *
FROM video AS vid
INNER JOIN subscriptions AS sub ON vid.uploader = sub.subscription_id
WHERE sub.subscriber_id = '1'
AND vid.privacy = 0 AND vid.blocked <> 1 AND vid.converted = 1
ORDER BY vid.id DESC
LIMIT 8
Running explain, it would show "Using temporary; Using filesort" in subscriptions table and its slow (0.0900 seconds).
Without ORDER BY vid.id DESC it doesn't show "Using temporary; Using filesort" so its fast (0.0004 seconds) but I don't understand how the other table can affect it like this.
All the fields are indexed (privacy blocked and converted fields don't affect performance by more than 10%).
I would paste the full explain information but I can't seem to make it fit nice in the layout of this site.

You're limiting the query to 8 results. When you run it without an order by, it can grab the first 8 rows it comes across that pass the condition, and then hand them back. Boom, it's done.
When you use the order by, you're not asking for any 8 records. You're asking for the first 8 records in terms of vid.id. So it has to figure out which those are, and the only way to do that is to look through the entire table and compare vid.id values. That's a lot more work.
Is there actually an index on the column? If so, it may be out of date. You could try rebuilding it.

Fixed it by suggesting that mysql use the primary index with USE_INDEX(PRIMARY)
SELECT vid. *
FROM video AS vid USE INDEX ( PRIMARY )
INNER JOIN subscriptions AS sub ON vid.uploader = sub.subscription_id
WHERE sub.subscriber_id = '1'
AND vid.privacy =0
AND vid.blocked <>1
AND vid.converted =1
ORDER BY vid.id DESC
LIMIT 8

Related

Joining three tables, than order, how to use index in MySql?

I have these three tables:
season(id, season);
game_in_season(id, id_season, game);
player_in_game(id, id_game, full_name, pts);
I want to select all players of season with index 5 and order them by pts. Which index should I use? I have an index on pg.pts column, but it is not used when I join the table with s and gs tables. It is only used when I make "select * from pg order by pts desc".
EXPLAIN SELECT pg.* FROM season s, game_in_season gs, player_in_game pg
WHERE s.id = gs.id_season AND gs.id = pg.id_game
AND s.id = 5
ORDER BY pg.pts DESC
In a row with table = 's' there is extra = 'Using temporary; Using filesort'. Which index should I use to not using filesort? Is it even possible to make this query without using filesort?
Please use JOIN..ON instead of commajoin:
SELECT pg.*
FROM season s
JOIN game_in_season gs ON s.id = gs.id_season
JOIN player_in_game pg ON gs.id = pg.id_game
WHERE s.id = 5
ORDER BY pg.pts DESC
The Optimizer has two ways to execute:
Start by filtering s using s.id = 5, or equivalently filtering gs on gs.id_season on gs.id_season = 5.
Start by using INDEX(pts) in pg (if you have such).
The former must sort the data after it gets to the ORDER BY. The Optimizer is likely to pick this approach.
The latter is inefficient because it will have to read lots of unnecessary rows, since the test for 5 comes too late.
Using temporary; Using filesort seems to always be in the first line of EXPLAIN; this is confusing because
Often the sort comes later in the processing; and
The sort often happens in RAM, not in a "file" or with a real "temporary" table. That is, you (and many other MySQL users) should not be scared by that 'Extra'.
I assume you have these indexes (or PRIMARY KEYs):
season: (id)
game_in_season: (id_season) -- Better would be (id_season, id)
player_in_game: (id_game)
Bottom line: The query is probably running as fast as it can.
When asking performance questions, please provide SHOW CREATE TABLE and the EXPLAIN.

Why MySql doesn't use my index and how to avoid the "Using temporary; Using filesort"?

I have a MySql query that looks like the following:
SELECT trace_access.Employe_Code, trace_access.Employe_Prenom,
trace_access.Employe_Nom, , trace_access.Evenement_Date, trace_access.Evenement_Heure
FROM trace_access
INNER JOIN emp
ON trace_access.Employe_Code = emp.Employe_CodeEmploye
LEFT JOIN user
ON emp.Employe_ID = user.Employe_ID
LEFT JOIN role
ON role.User_ID = user.User_ID
WHERE trace_access.Employe_Nom Not Like "TEST%NU"
ORDER BY trace_access.Evenement_Date DESC , trace_access.Evenement_Heure DESC
The table "trace_access"contains almost 20 million entries.
When I explain the query:
explain query
My question is why MySql didn't use the key for the emp table and how to avoid "Using temporary; Using filesort" ??
I have tried to froce it to use my index but that didn't work.
The query lasts one hour and more, writes a file on /tmp folder that exceeds 8Go !!!
Any help ?
Thanks a lot.
With your current execution plan (to start with emp), a full table scan is required and there is no usefull index on emp. This can make sense when you only get a relatively small amount of rows after your join to trace_access, which probably isn't the case here.
To prevent the filesort, you need an index that supports your order by. So add, if it doesn't exist yet, the index trace_access(Evenement_Date, Evenement_Heure).
This might already be enough to make MySQL start with trace_access. If not, replace INNER JOIN emp with STRAIGHT_JOIN emp. This will force MySQL to do so.
Also add an index for emp(Employe_CodeEmploye).
Depending on your data, you can try to add Employe_Code and/or Employe_Nom as 3rd and/or 4th column to the index on trace_access, though it will probably not have much effect.

MySQL: Grouped/Ordered/Left Join query very slow

I have a problem with a query which takes far too long (Over two seconds just for this simple query).
On first look it appears to be an indexing issue, all joined fields are indexed, but i cannot find what else I may need to index to speed this up. As soon as i add the fields i need to the query, it gets even slower.
SELECT `jobs`.`job_id` AS `job_id` FROM tabledef_Jobs AS jobs
LEFT JOIN tabledef_JobCatLink AS jobcats ON jobs.job_id = jobcats.job_id
LEFT JOIN tabledef_Applications AS apps ON jobs.job_id = apps.job_id
LEFT JOIN tabledef_Companies AS company ON jobs.company_id = company.company_id
GROUP BY `jobs`.`job_id`
ORDER BY `jobs`.`date_posted` ASC
LIMIT 0 , 50
Table row counts (~): tabledef_Jobs (108k), tabledef_JobCatLink (109k), tabledef_Companies (100), tabledef_Applications (50k)
Here you can see the Describe. 'Using temporary' appears to be what is slowing down the query:
table index screenshots:
Any help would be greatly appreciated
EDIT WITH ANSWER
Final improved query with thanks to #Steve (marked answer). Ultimately, the final query was reduced from ~22s to ~0.3s:
SELECT `jobs`.`job_id` AS `job_id` FROM
(
SELECT * FROM tabledef_Jobs as jobs ORDER BY `jobs`.`date_posted` ASC LIMIT 0 , 50
) AS jobs
LEFT JOIN tabledef_JobCatLink AS jobcats ON jobs.job_id = jobcats.job_id
LEFT JOIN tabledef_Applications AS apps ON jobs.job_id = apps.job_id
LEFT JOIN tabledef_Companies AS company ON jobs.company_id = company.company_id
GROUP BY `jobs`.`job_id`
ORDER BY `jobs`.`date_posted` ASC
LIMIT 0 , 50
Right, I’ll have a stab at this.
It would appear that the Query Optimiser cannot use an index to fulfil the query upon the tabledef_Jobs table.
You've got an offset limit and this with the combination of your ORDER BY cannot limit the amount of data before joining and thus it is having to group by job_id which is a PK and fast – but then order that data (temporary table and a filesort) before limiting and throwing away a the vast majorly of this data before finally join everything else to it.
I would suggest, adding a composite index to jobs of “job_id, date_posted”
So firstly optimise the base query:
SELECT * FROM tabledef_Jobs
GROUP BY job_id
ORDER BY date_posted
LIMIT 0,50
Then you can combine the joins and the final structure together to make a more efficient query.
I cannot let it go by without suggesting you rethink your limit offset. This is fine for small initial offsets but when it starts to get large this can be a major cause of performance issues. Let’s for example sake say you’re using this for pagination, what happens if they want page 3,000 – you will use
LIMIT 3000, 50
This will then collect 3050 rows / manipulate the data and then throw away the first 3000.
[edit 1 - In response to comments below]
I will expand with some more information that might point you in the right direction. Unfortunately there isn’t a simple fix that will resolve it , you must understand why this is happening to be able to address it. Simply removing the LIMIT or ORDER BY may not work and after all you don’t want to remove then as its part of your query which means it must be there for a purpose.
Optimise the simple base query first that is usually a lot easier than working with multi-joined datasets.
Despite all the bashing it receives there is nothing wrong with filesort. Sometimes this is the only way to execute the query. Agreed it can be the cause of many performance issues (especially on larger data sets) but that’s not usually the fault of filesort but the underlying query / indexing strategy.
Within MySQL you cannot mix indexes or mix orders of the same index – performing such a task will result in a filesort.
How about as I suggested creating an index on date_posted and then using:
SELECT jobs.job_id, jobs.date_posted, jobcats .*, apps.*, company .* FROM
(
SELECT DISTINCT job_id FROM tabledef_Jobs
ORDER BY date_posted
LIMIT 0,50
) AS jobs
LEFT JOIN tabledef_JobCatLink AS jobcats ON jobs.job_id = jobcats.job_id
LEFT JOIN tabledef_Applications AS apps ON jobs.job_id = apps.job_id
LEFT JOIN tabledef_Companies AS company ON jobs.company_id = company.company_id

Slow MySQL query with AS and subquery

I have a problem with this slow query that runs for 10+ seconds:
SELECT DISTINCT siteid,
storyid,
added,
title,
subscore1,
subscore2,
subscore3,
( 1 * subscore1 + 0.8 * subscore2 + 0.1 * subscore3 ) AS score
FROM articles
WHERE added > '2011-10-23 09:10:19'
AND ( articles.feedid IN (SELECT userfeeds.siteid
FROM userfeeds
WHERE userfeeds.userid = '1234')
OR ( articles.title REGEXP '[[:<:]]keyword1[[:>:]]' = 1
OR articles.title REGEXP '[[:<:]]keyword2[[:>:]]' = 1 ) )
ORDER BY score DESC
LIMIT 0, 25
This outputs a list of stories based on the sites that a user added to his account. The ranking is determined by score, which is made up out of the subscore columns.
The query uses filesort and uses indices on PRIMARY and feedid.
Results of an EXPLAIN:
1 PRIMARY articles
range
PRIMARY,added,storyid
PRIMARY 729263 rows
Using where; Using filesort
2 DEPENDENT SUBQUERY
userfeeds
index_subquery storyid,userid,siteid_storyid
siteid func
1 row
Using where
Any suggestions to improve this query? Thank you.
I would move the calculation logic to the client and only load fields from the database. This makes your query and the calculation itself faster. It's not a good style to do such things in SQL code.
And also is the regex very slow, maybe another searching mode like 'LIKE' is faster.
Looking at your EXPLAIN, it doesn't appear your query is utilizing any index (thus the filesort). This is being caused by the sort on the calculated column (score).
Another barrier is the size of the table (729263 rows). You don't want to create an index that is too wide as it will take much more space and impact performance of your CUD operations. What we want to do is target the columns that are being selected, however, in this situation we can't since it's a calculated column. You can try creating a VIEW or either remove the sort or do it at the application layer.

mysql fastest way to get max value under specific threshold

I've got two tables and a slow query in mysql.
The tables:
Table clips with fields channel,start_time,end_time
Table shows with fields channel,start_time,end_time
both tables have indeces for field start_time.
I am trying to find the show that started just before the clip for many clips.
So far I've got this query:
SELECT (
SELECT shows.id
FROM shows
WHERE shows.starttime<=clips.starttime AND shows.channel=clips.channel
ORDER BY shows.starttime DESC
LIMIT 1) as show_id,
clips.*
FROM clips
For a small number of clips this works great but for large number of clips it gets too slow.
My understanding would be that the dependent subquery should be extra fast since there is an index on start_time and all that needs to be done is an index lookup. Nevertheless it is slow and explaining the query states "using where" instead of "using index".
Here is the output of explain
--+------------------+-----+-----+------------+---------+------+----+------+-----------------------+
id| select_type |table|type |possibleKeys| key |keylen|ref |rows | Extra |
--+------------------+-----+-----+------------+---------+------+----+------+-----------------------+
1|PRIMARY |clips|range| startDate |startDate| 8 |NULL| 9095 |Using where;Using index|
2|DEPENDENT SUBQUERY|shows|index| startDate |startDate| 8 |NULL|287896|Using where;Using index|
--+------------------+-----+-----+------------+---------+------+----+------+-----------------------+
Any suggestions on how to improve performace for this task would be greatly appreciated.
Try to rewrite the query as
SELECT max(shows.starttime) as show_start, shows.id as show_id, clips.*
FROM shows
INNER JOIN clips ON (clips.channel = shows.channel AND shows.starttime<=clips.starttime)
GROUP BY clips.id
Because clips are part of a show, you would expect them to be close together, you can limit the number of hits further by doing something like:
SELECT max(shows.starttime) as show_start, shows.id as show_id, clips.*
FROM shows
INNER JOIN clips ON (clips.channel = shows.channel
AND clips.starttime BETWEEN shows.starttime AND DATE_ADD(shows.starttime, INTERVAL 1 DAY) )
GROUP BY clips.id
This will prevent MySQL from running a full subquery with sort on every row of clips.
I think adding an index that uses both start_time and channel columns may improve the query performance to an acceptable value.
Johan's answer is great, but given your filters I think the index may improve the performance in any case.