Joining three tables, than order, how to use index in MySql? - mysql

I have these three tables:
season(id, season);
game_in_season(id, id_season, game);
player_in_game(id, id_game, full_name, pts);
I want to select all players of season with index 5 and order them by pts. Which index should I use? I have an index on pg.pts column, but it is not used when I join the table with s and gs tables. It is only used when I make "select * from pg order by pts desc".
EXPLAIN SELECT pg.* FROM season s, game_in_season gs, player_in_game pg
WHERE s.id = gs.id_season AND gs.id = pg.id_game
AND s.id = 5
ORDER BY pg.pts DESC
In a row with table = 's' there is extra = 'Using temporary; Using filesort'. Which index should I use to not using filesort? Is it even possible to make this query without using filesort?

Please use JOIN..ON instead of commajoin:
SELECT pg.*
FROM season s
JOIN game_in_season gs ON s.id = gs.id_season
JOIN player_in_game pg ON gs.id = pg.id_game
WHERE s.id = 5
ORDER BY pg.pts DESC
The Optimizer has two ways to execute:
Start by filtering s using s.id = 5, or equivalently filtering gs on gs.id_season on gs.id_season = 5.
Start by using INDEX(pts) in pg (if you have such).
The former must sort the data after it gets to the ORDER BY. The Optimizer is likely to pick this approach.
The latter is inefficient because it will have to read lots of unnecessary rows, since the test for 5 comes too late.
Using temporary; Using filesort seems to always be in the first line of EXPLAIN; this is confusing because
Often the sort comes later in the processing; and
The sort often happens in RAM, not in a "file" or with a real "temporary" table. That is, you (and many other MySQL users) should not be scared by that 'Extra'.
I assume you have these indexes (or PRIMARY KEYs):
season: (id)
game_in_season: (id_season) -- Better would be (id_season, id)
player_in_game: (id_game)
Bottom line: The query is probably running as fast as it can.
When asking performance questions, please provide SHOW CREATE TABLE and the EXPLAIN.

Related

Mysql Select INNER JOIN with order by very slow

I'm trying to speed up a mysql query. The Listings table has several million rows. If I don't sort them later I get the result in 0.1 seconds but once I sort it takes 7 seconds. What can I improve to speed up the query?
SELECT l.*
FROM listings l
INNER JOIN listings_categories lc
ON l.id=lc.list_id
AND lc.cat_id='2058'
INNER JOIN locations loc
ON l.location_id=loc.id
WHERE l.location_id
IN (7841,7842,7843,7844,7845,7846,7847,7848,7849,7850,7851,7852,7853,7854,7855,7856,7857,7858,7859,7860,7861,7862,7863,7864,7865,7866,7867,7868,7869,7870,7871,7872,7873,7874,7875,7876,7877,7878,7879,7880,7881,7882,7883,7884,7885,7886,7887,7888,7889,7890,7891,7892,7893,7894,7895,7896,7897,7898,7899,7900,7901,7902,7903)
ORDER BY date
DESC LIMIT 0,10;
EXPLAIN SELECT: Using Index l=date, loc=primary, lc=primary
Such performance questions are really difficult to answer and depend on the setup, indexes etc. So, there will likely not the one and only solution and even not really correct or incorrect attempts to improve the speed. This is a lof of try and error. Anyway, some points I noted which often cause performance issues are:
Avoid conditions within joins that should be placed in the where instead. A join should contain the columns only that will be joined, no further conditions. So the "lc.cat_id='2058" should be put in the where clause.
Using IN is often slow. You could try to replace it by using OR (l.location_id = 7841 OR location_id = 7842 OR...)
Open the query execution plan and check whether there is something useful for you.
Try to find out if there are special cases/values within the affected columns which slow down your query
Change "ORDER BY date" to "ORDER BY tablealias.date" and check if this makes a difference in performance. Even if not, it is better to read.
If you can rename the column "date", do this because using SQL keywords as table name or column name is no good idea. I'm unsure if this influences the performance, but it should be avoided if possible.
Good luck!
You can try additonal indexes to speed up the query, but you'll have a tradeoff when creating/manipulating data.
These combined keys could speed up the query:
listings: date, location_id
listings_categories: cat_id, list_id
Since the plan says it uses the date index, there wouldn't be a need to read the record to check the location_id when usign the new index, and same for the join with listinngs_category, index read would be enough
l: INDEX(location_id, id)
lc: INDEX(cat_id, list_id)
If those don't suffice, try the following rewrite.
SELECT l2.*
FROM
(
SELECT l1.id
FROM listings AS l1
JOIN listings_categories AS lc ON lc.list_id = l1.id
JOIN locations AS loc ON loc.id = l1.location_id
WHERE lc.cat_id='2058'
AND l1.location_id IN (7841, ..., 7903)
ORDER BY l1.date DESC
LIMIT 0,10
) AS x
JOIN listings l2 ON l1.id = x.id
ORDER BY l2.date DESC
With
listings: INDEX(location_id, date, id)
listings_categories: INDEX(cat_id, list_id)
The idea here is to get the 10 ids from the index before reaching to the table itself. Your version is probably shoveling around the whole table before sorting, and then delivering the 10.

What type of index is ideal for this query?

I have an example query such as:
SELECT
rest.name, rest.shortname
FROM
restaurant AS rest
INNER JOIN specials ON rest.id=specials.restaurantid
WHERE
specials.dateend >= CURDATE()
AND
rest.state='VIC'
AND
rest.status = 1
AND
specials.status = 1
ORDER BY
rest.name ASC;
Just wondering of the below two indexes, which would be best on the restaurant table?
id,state,status,name
state,status,name
Just not sure if column used in the join should be included?
Funny enough though, I have created both types for testing and both times MySQL chooses the primary index, which is just id. Why is that?
Explain Output:
1,'SIMPLE','specials','index','NewIndex1\,NewIndex2\,NewIndex3\,NewIndex4','NewIndex4','11',\N,82,'Using where; Using index; Using temporary; Using filesort',
1,'SIMPLE','rest','eq_ref','PRIMARY\,search\,status\,state\,NewIndex1\,NewIndex2\,id-suburb\,NewIndex3\,id-status-name','PRIMARY','4','db_name.specials.restaurantid',1,'Using where'
Not many rows at the moment so perhaps that's why it's choosing PRIMARY!?
For optimum performance, you need at least 2 indexes:
The most important index is the one on the foreign key:
CREATE INDEX specials_rest_fk ON specials(restaurantid);
Without this, your queries will perform poorly, because every row in rest that matches the WHERE conditions will require a full tablescan of specials.
The next index to define would be the one that helps look up the fewest rows of rest given your conditions. Only one index is ever used, so you want to make that index find as few rows from rest as possible.
My guess, state and status:
CREATE INDEX rest_index_1 on rest(state, status);
Your index suggestion of (id, ...) is pointless, because id is unique - adding more column won't help, and in fact would worsen performance if it were used, because the index entries would be larger and you'd get less entries per I/O page read.
But you can gain performance by writing the query better too; if you move the conditions on specials into the join ON condition, you'll gain significant performance, because join conditions are evaluated as the join is made, but where conditions are evaluated on all joined rows, meaning the temporary result set that is filtered by the WHERE clause is much larger and therefore slower.
Change your query to this:
SELECT rest.name, rest.shortname
FROM restaurant AS rest
INNER JOIN specials
ON rest.id=specials.restaurantid
AND specials.dateend >= CURDATE()
AND specials.status = 1
WHERE rest.state='VIC'
AND rest.status = 1
ORDER BY rest.name;
Note how the conditions on specials are now in the ON clause.

Why does order by primary index make this query slow?

This query is getting the newest videos uploaded by the user's subscriptions, its running very slow so I rewrote it to use joins but It didn't make a difference and after tinkering with it I found out that removing ORDER BY would make it run fast (however it defeats the purpose of the query).
Query:
SELECT vid. *
FROM video AS vid
INNER JOIN subscriptions AS sub ON vid.uploader = sub.subscription_id
WHERE sub.subscriber_id = '1'
AND vid.privacy = 0 AND vid.blocked <> 1 AND vid.converted = 1
ORDER BY vid.id DESC
LIMIT 8
Running explain, it would show "Using temporary; Using filesort" in subscriptions table and its slow (0.0900 seconds).
Without ORDER BY vid.id DESC it doesn't show "Using temporary; Using filesort" so its fast (0.0004 seconds) but I don't understand how the other table can affect it like this.
All the fields are indexed (privacy blocked and converted fields don't affect performance by more than 10%).
I would paste the full explain information but I can't seem to make it fit nice in the layout of this site.
You're limiting the query to 8 results. When you run it without an order by, it can grab the first 8 rows it comes across that pass the condition, and then hand them back. Boom, it's done.
When you use the order by, you're not asking for any 8 records. You're asking for the first 8 records in terms of vid.id. So it has to figure out which those are, and the only way to do that is to look through the entire table and compare vid.id values. That's a lot more work.
Is there actually an index on the column? If so, it may be out of date. You could try rebuilding it.
Fixed it by suggesting that mysql use the primary index with USE_INDEX(PRIMARY)
SELECT vid. *
FROM video AS vid USE INDEX ( PRIMARY )
INNER JOIN subscriptions AS sub ON vid.uploader = sub.subscription_id
WHERE sub.subscriber_id = '1'
AND vid.privacy =0
AND vid.blocked <>1
AND vid.converted =1
ORDER BY vid.id DESC
LIMIT 8

how to set indexes for join and group by queries

Let's say we have a common join like the one below:
EXPLAIN SELECT *
FROM visited_links vl
JOIN device_tracker dt ON ( dt.Client_id = vl.Client_id
AND dt.Device_id = vl.Device_id )
GROUP BY dt.id
if we execute an explain, it says:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE vl index NULL vl_id 273 NULL 1977 Using index; Using temporary; Using filesort
1 SIMPLE dt ref Device_id,Device_id_2 Device_id 257 datumprotect.vl.device_id 4 Using where
I know that sometimes it's difficult to choose the right indexes when you are using group by but, what indexes could I set for avoiding 'using temporary, using filesort' in this query? why is this happening? and specially, why this happens after using an index?
One point to mention is that the fields returned by the select (* in this case) should either be in the GROUP BY clause or be using agregate functions such as SUM() or MAX(). Otherwise unexpected results can occur. This is because if the database is not told how to choose fields that are not in the group by clause you may get any member of the group, pretty much at random.
The way I look at it is to break the query down into bits.
you have a join on (dt.Client_id = vl.Client_id and dt.Device_id = vl.Device_id) so all of those fields should be indexed in their respective tables.
You are using GROUP BY dt.id so you need an index that includes dt.id
BUT...
an index on (dt.client_id,dt.device_id,dt.id) will not work for the GROUP BY
and
an index on (dt.id, dt.client_id, dt.device_id) will not work for the join.
Sometimes you end up with a query which just can't use an index.
See also:
http://ntsrikanth.blogspot.com/2007/11/sql-query-order-of-execution.html
You didn't post your indices, but first of all, you'll want to have an index for (client_id, device_id) on visited_links, and (client_id, device_id, id) on device_tracker to make sure that query is fully indexed.
From page 191 of the excellent High Performance MySQL, 2nd Ed.:
MySQL has two kinds of GROUP BY strategies when it can't use an index: it can use a temporary table or a filesort to perform the grouping. Either one can be more efficient depending on the query. You can force the optimizer to choose one method or the other with the SQL_BIG_RESULT and SQL_SMALL_RESULT optimizer hints.
In your case, I think the issue stems from joining on multiple columns and using GROUP BY together, even after the suggested indices are in place. If you remove either (a) one of the join conditions or (b) the GROUP BY, this shouldn't need a filesort.
However, keep in mind that a filesort doesn't always use actual files, it can also happen entirely within a memory buffer if the result set is small enough, so the performance penalty may be minimal. Consider the wall-clock time for the query too.
HTH!

MySQL performance, inner join, how to avoid Using temporary and filesort

I have a table 1 and table 2.
Table 1
PARTNUM - ID_BRAND
partnum is the primary key
id_brand is "indexed"
Table 2
ID_BRAND - BRAND_NAME
id_brand is the primary key
brand_name is "indexed"
The table 1 contains 1 million of records and the table 2 contains 1.000 records.
I'm trying to optimize some query using EXPLAIN and after a lot of try I have reached a dead end.
EXPLAIN
SELECT pm.partnum, pb.brand_name
FROM products_main AS pm
LEFT JOIN products_brands AS pb ON pm.id_brand=pb.id_brand
ORDER BY pb.brand ASC
LIMIT 0, 10
The query returns this execution plan:
ID, SELECT_TYPE, TABLE, TYPE, POSSIBLE_KEYS, KEY, KEY_LEN , REF, ROWS, EXTRA
1, SIMPLE, pm, range, PRIMARY, PRIMARY, 1, , 1000000, Using where; Using temporary; Using filesort
1, SIMPLE, pb, ref, PRIMARY, PRIMARY, 4, demo.pm.id_pbrand, 1,
The MySQL query optimizer shows a temporary + filesort in the execution plan.
How can I avoid this?
The "EVIL" is in the ORDER BY pb.brand ASC. Ordering by that external field seems to be the bottleneck..
First of all, I question the use of an outer join seeing as the order by is operating on the rhs, and the NULL's injected by the left join are likely to play havoc with it.
Regardless, the simplest approach to speeding up this query would be a covering index on pb.id_brand and pb.brand. This will allow the order by to be evaluated 'using index' with the join condition. The alternative is to find some way to reduce the size of the intermediate result passed to the order-by.
Still, the combination of outer-join, order-by, and limit, leaves me wondering what exactly you are querying for, and if there might not be a better way of expressing the query itself.
Try replacing the join with a subquery. MySQL's optimizer kind of sucks; subqueries often give better performance than joins.
First, try changing your index on the products_brands table. Delete the existing one on brand_name, and create a new one:
ALTER TABLE products_brands ADD INDEX newIdx (brand_name, id_brand)
Then, the table will already have a "orderedByBrandName" index with the ids you need for the join, and you can try:
EXPLAIN
SELECT pb.brand_name, pm.partnum
FROM products_brands AS pb
LEFT JOIN products_main AS pm ON pb.id_brand = pm.id_brand
LIMIT 0, 10
Note that I also changed the order of the tables in the query, so you start with the small one.
This question is somewhat outdated, but I did find it, and so will other people.
Mysql uses temporary if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue.
So you just need to have the join order reversed by using STRAIGHT_JOIN, to bypass the order invented by optimizer:
SELECT STRAIGHT_JOIN pm.partnum, pb.brand_name
FROM products_brands AS pb
RIGHT JOIN products_main AS pm ON pm.id_brand=pb.id_brand
ORDER BY pb.brand ASC
LIMIT 0, 10
Also make sure that max_heap_table_size AND tmp_table_size variables are set to a number big enough to store the results:
SET global tmp_table_size=100000000;
SET global max_heap_table_size=100000000;
-- 100 megabytes in this example. These can be set in my.cnf config file, too.