I'm a bit of a db noob and have a nasty query that is taking over 30 seconds to run. I'm trying to learn a bit more about EXPLAIN and optimize the query but am at a loss. Here is the query:
SELECT
feed.*, users.username, smf_attachments.id_attach AS avatar,
games.name AS item_name, games.image, feed.item_id, u2.username AS follow_name
FROM feed
INNER JOIN following ON following.follow_id = feed.user_id AND following.user_id = 1
LEFT JOIN users ON users.id = feed.user_id
LEFT JOIN smf_members ON smf_members.member_name = users.username
LEFT JOIN smf_attachments ON smf_attachments.id_member = smf_members.id_member
LEFT JOIN games ON games.id = feed.item_id
LEFT JOIN users u2 ON u2.id = feed.item_id
ORDER BY feed.timestamp DESC
LIMIT 25
Explain results:
The result you will want to avoid in your execution plan (the output of an explain statement) is "full scan" (extra field of the explain output). In order to avoid it, you need to create the correct indexes on your tables.
If you have a table scan, it means the query engine read sequentially each row of the the table. Instead, with index access, the query engines accesses more directly the relevant data.
More explanation here: http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
Related
I have a simple MySQL InnoDB database with two tables: users and dialogues. I am trying to make a LEFT JOIN query, however, I've ran into a performance problem.
When I execute the following statement,
EXPLAIN SELECT u.id FROM users u
LEFT JOIN dialogues d ON u.id = d.creator_id
I get a response that DB uses SELECT types index and ref, which is totally fine.
However, when I add an additional clause:
EXPLAIN SELECT u.id FROM users u
LEFT JOIN dialogues d ON (u.id = d.creator_id OR u.id = d.target_id)
suddenly the DB indicates that it uses all SELECT type when JOINing, which in turn makes the actual query multiple times slower.
Is there something that could be done to make DB use more effective SELECT type in the second example?
d.creator_id and d.target_id columns have foreign keys connected to u.id.
It is usually faster to do two left joins and coalesce() in the select:
SELECT d.*,
COALESCE(uc.name, ut.name) as name
FROM dialogues d LEFT JOIN
users uc
ON uc.id = d.creator_id LEFT JOIN
users ut
ON ut.id = d.target_id
I have a query that looks like this:
select `adverts`.*
from `adverts`
inner join `advert_category` on `advert_category`.`advert_id` = `adverts`.`id`
inner join `advert_location` on `adverts`.`id` = `advert_location`.`advert_id`
where `advert_location`.`location_id` = ?
and `advert_category`.`category_id` = ?
order by `updated_at` desc
The problem here is I have a huge database and this response is absolutely ravaging my database.
What I really need is to do the first join, and then do there where clause. This will whittle down my response from like 100k queries to less than 10k, then I want to do the other join, in order to whittle down the responses again so I can get the advert_location on the category items.
Doing it as is just isn't viable.
So, how do I go about using a join and a where condition, and then after getting that response doing a further join with a where condition?
Thanks
This is your query, written a bit simpler so I can read it:
select a.*
from adverts a inner join
advert_category ac
on ac.advert_id = a.id inner join
advert_location al
on al.advert_id = a.id
where al.location_id = ? and
ac.category_id = ?
order by a.updated_at desc;
I am speculating that advert_category and advert_locations have multiple rows per advert. In that case, you are getting a Cartesian product for each advert.
A better way to write the query uses exists:
select a.*
from adverts a
where exists (select 1
from advert_location al
where al.advert_id = a.id and al.location_id = ?
) and
exists (select 1
from advert_category ac
where ac.advert_id = a.id and ac.category_id = ?
)
order by a.updated_at desc;
For this version, you want indexes on advert_location(advert_id, location_id), advert_category(advert_id, category_id), and probably advert(updated_at, id).
You can write the 1st join in a Derived Table including a WHERE-condition and then do the 2nd join (but a decent optimizer might resolve the Derived Table again and do what he thinks is best based on statistics):
select adverts.*
from
(
select `adverts`.*
from `adverts`
inner join `advert_category`
on `advert_category`.`advert_id` =`adverts`.`id`
where `advert_category`.`category_id` = ?
) as adverts
inner join `advert_location`
on `adverts`.`id` = `advert_location`.`advert_id`
where `advert_location`.`location_id` = ?
order by `updated_at` desc
MySQL will reorder inner joins for you during optimization, regardless of how you wrote them in your query. Inner join is the same in either direction (in algebra this is called commutative), so this is safe to do.
You can see the result of join reordering if you use EXPLAIN on your query.
If you don't like the order MySQL chose for your joins, you can override it with this kind of syntax:
from `adverts`
straight_join `advert_category` ...
https://dev.mysql.com/doc/refman/5.7/en/join.html says:
STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer processes the tables in a suboptimal order.
Once the optimizer has decided on the join order, it always does one join at a time, in that order. This is called the nested join method.
There isn't really any way to "do the join then do the where clause". Conditions are combined together when looking up rows for joined tables. But this is a good thing, because you can then create a compound index that helps match rows based on both join conditions and where conditions.
PS: When asking query optimization question, you should include the EXPLAIN output, and also run SHOW CREATE TABLE <tablename> for each table, and include the result. Then we don't have to guess at the columns and indexes in your table.
I wanted to know the difference between the 2 queries.I have 2 tables: Users and Emails.
User schema - id, name, email, is_subscribed, created, modified.
Email schema - id, user_id, sent_at, subject.
So I need to find the count those users, who have received a total of more than 20 emails throughout.
User table has roughly around 100K records. And Emails table have nearly 4 million records
1st Query
SELECT u.id, u.email, count(u.id)
FROM emails as e
LEFT JOIN users as u
ON e.user_id = u.id
WHERE u.is_subscribed = 1
GROUP BY e.user_id HAVING count(u.id) > 20
2nd Query
SELECT u.id, u.email, count(u.id)
FROM users as u
INNER JOIN emails as e
ON e.user_id = u.id
WHERE u.is_subscribed = 1
GROUP BY e.user_id HAVING count(u.id) > 20
What I have tried:
1)On production, these query takes like forever to execute, so on local, I have created sample table with dummy records. i.e
User table - around 5 records and Emails table around 100 records.
When I execute the above two queries I get the same result set for both the queries and when checked for Profiling, I get the same execution time for both queries(which may be different on production) so it is hard to know which is the better one. (This may not be the optimal way to find the solution.)
2)Used Explain with the query, and it shows it scans all 100 rows of emails table in both the cases(queries)
Please let me know if I have missed any specifics. I will update the question.
Read about MySQL LEFT JOIN optimization. The DBMS can tell that your LEFT JOINs WHERE is filtering out all the NULL-extended rows that come from LEFT JOIN that don't come from INNER JOIN so it just does an INNER JOIN.
MySQL 5.7 Reference Manual
9.2.1.9 LEFT JOIN and RIGHT JOIN Optimization
For a LEFT JOIN, if the WHERE condition is always false for the generated NULL row, the LEFT JOIN is changed to a normal join.
(Since you don't want NULL-extended rows, why would you use LEFT JOIN?)
Please try below query:-
SELECT u.id, u.email, count(u.id)
FROM users as u
INNER JOIN emails as e ON e.user_id = u.id
WHERE u.is_subscribed = 1
GROUP BY u.id
HAVING count(u.id) > 20
I have to run this query and it is pretty slow (4.86 seconds):
SELECT DISTINCT (users.id), users . *
FROM users
LEFT JOIN user_stages ON users.id = user_stages.user_id
LEFT JOIN user_tags ON users.id = user_tags.user_id
LEFT JOIN log ON log.user_id = users.id
ORDER BY last_activity DESC
When I do profiling it looks like Copying to tmp table takes 91% of the time (3.710409 seconds).
The size of the tables: users - almost 100,000 records, log - 1,443,000 records, user_stages - 66,000 records, user_tags - 260,000 records.
There are indexes properly added, if you want I can write all the indexes. How can I rewrite the query or modify the mysql settings to make this query faster?
Assuming last_activity is in the users table, you can change the query to the following:
SELECT users.*
FROM users
ORDER BY last_activity DESC
Your query is selecting only columns from the users table. The left join ensures that all rows from the table appear at least once. The distinct is removing duplicates added by the other tables. Hence, the joins are unnecessary.
If last_activity is in another table, then you might need to join that information in.
Your joins are probably taking so much time because you are getting cross products of rows for each user from the various tables.
SELECT `users`.*
FROM `users`
LEFT JOIN `user_stages` ON `users`.`id` = `user_stages`.`user_id`
LEFT JOIN `user_tags` ON `users`.`id` = `user_tags`.`user_id`
LEFT JOIN `log` ON `log`.`user_id` = `users`.`id`
GROUP BY `users`.`id`
ORDER BY `last_activity` DESC;
The query is built on the fly based on user's input. Sometimes it looks like this:
SELECT DISTINCT (users.id), users . *
FROM users
LEFT JOIN user_stages ON users.id = user_stages.user_id
LEFT JOIN user_tags ON users.id = user_tags.user_id
LEFT JOIN log ON log.user_id = users.id
WHERE user_stages.stage_id = 5
AND user_tags.tag_id = 10
ORDER BY last_activity DESC
The query has been written using GROUP BY initially but it was slower (about 8 seconds). I replaced GROUP BY with DISTINCT and it was faster but not fast enough. If you have any suggestions I would appreciate.
In this sql:
SELECT s.*,
u.id,
u.name
FROM shops s
LEFT JOIN users u ON u.id = s.user_id
OR u.id = s.owner_user_id
WHERE s.status = 1
For some reason this query takes an amazing time. although id is the primary key. it seems especially after I added this part OR u.id=s.owner_user_id the query became slow. owner_user_id often is 0 only handful of times. But why would it take so long apparently scanning the whole table? The database table users is very long and big. I didn't design it. this is for a client who subsequent programmers added too many fields. the table is 22k rows and dozens of fields.
*the names of the fields for demonstration only. actual names are different, so don't ask me why I'm looking for owner_user_id (; I did solve the slowness by remove the "OR ..." part and instead searching for the id in the loop if it is not 0. but I would like to know why this is happening and how to speedup that query as is.
You may be able to speed it up by using IN instead of the OR but that is minor.
SELECT u.id,
u.name
FROM shops s
LEFT JOIN users u ON u.id IN ( s.user_id, s.owner_user_id )
WHERE s.status = 1
Firstly, are there any indexes on this table? Mainly one on the user.id field or the s.user_id or s.owner_user_id?
However, I must ask why you need to use a LEFT JOIN instead of a regular join. The LEFT JOIN causes the matching of every row with every other one. And since I'm assuming the value / id should either be in the user_id or the owner_user_id field, and that there will always be a match, if that is the case then the use of a JOIN should speed the query up a bit.
And as Mitch said, 22k rows is tiny.
How are you going to know which user record is which? Here's how I'd do it
SELECT s.*,
u.name AS user_name,
o.name AS owner_name
FROM shops s
LEFT JOIN users u ON s.user_id = u.id
LEFT JOIN users o ON s.owner_user_id = o.id
WHERE s.status = 1
I've omitted the IDs from the user table in the SELECT as these will be part of s.* anyway.
I'm curious about the left joins too. If shops.user_id and shops.owner_user_id are required foreign keys, use inner joins instead.