Nested query performance - mysql

I have two queries below. The first one has a nested select. The second one makes use of a group by clause.
select
posts.*,
(select count(*) from comments where comments.post_id = posts.id and comments.is_approved = 1) as comments_count
from
posts
select
posts.*,
count(comments.id) comments_count
from
posts
left join comments on
comments.post_id = posts.id
group by
posts.*
From my understanding the first query is worse because it has to do a select for each record in posts where as the second query does not.
Is this true or false?

As with all performance questions, you should test the performance on your system with your data.
However, I would expect the first to perform better, with the right indexes. The right index for:
select p.*,
(select count(*)
from comments c
where c.post_id = p.id and c.is_approved = 1
) as comments_count
from posts p
is comments(post_id, is_approved).
MySQL implements a group by by doing a file sort. This version saves a file sort on all the data. My guess is that will be faster than the second method.
As a note: group by posts.* is not valid syntax. I assume this was intended for illustration purposes only.

This is the standard way I would do it (the use of LEFT JOIN, and SUM lets you also know which posts have no comments.)
SELECT posts.*
, SUM(IF(comments.id IS NULL, 0, 1)) AS comments_count
FROM posts
LEFT JOIN comments USING (post_id)
GROUP BY posts.post_id
;
But if I were trying for faster, this might be better.
SELECT posts.*, IFNULL(subQ.comments_count, 0) AS comments_count
FROM posts
LEFT JOIN (
SELECT post_id, COUNT(1) AS comments_count
FROM comments
GROUP BY post_id
) As subQ
USING (post_id)
;

After a bit more research I found no time difference between the two queries
Benchmark.bm do |b|
b.report('joined') do
1000.times do
ActiveRecord::Base.connection.execute('
select
p.id,
(select count(c.id) from comments c where c.post_id = p.id) comment_count
from
posts l;')
end
end
b.report('nested') do
1000.times do
ActiveRecord::Base.connection.execute('
select
p.id,
count(c.id) comment_count
from
posts File.join(File.dirname(__FILE__), *%w[rel path here])
left join comments c on
c.post_id = p.id
group by
p.id;')
end
end
end
user system total real
nested 2.120000 0.900000 3.020000 ( 3.349015)
joined 2.110000 0.990000 3.100000 ( 3.402986)
However I did notice that when running an explain for both queries, more indexes are possible in the first query. Which makes me think it is a better option if the attributes needed in the select changed.

Related

SQL query optimization and sort by other row if first is empty

SQL Query:
SELECT
T.*,
U.nick AS author_nick,
P.id AS post_id,
P.name AS post_name,
P.author AS post_author_id,
U2.nick AS post_author
FROM
zero_topics T
LEFT JOIN
zero_posts P
ON
T.id = P.topic_id
LEFT JOIN
zero_players U
ON
T.author = U.uuid
LEFT JOIN
zero_players U2
ON
P.author = U2.uuid
ORDER BY
P.id DESC
Questions:
I need to double left join to get user nick from UUID for topic and post
Not all topics will have post, as you see i sort from post id(it will be date) but it shows on first place topics with last post, and on bottom topics without replies, how can i define order when posts doesn't exists?
1.You will need to double left join if you need to show the nicks in different columns
2.You could use a case in you order by
ORDER BY
CASE
WHEN P.id is null THEN T.ID
ELSE P.ID
END ASC
Final Query:-
SELECT
T.*,
U.nick AS author_nick,
P.id AS post_id,
P.name AS post_name,
P.author AS post_author_id,
U2.nick AS post_author
FROM
zero_topics T
LEFT JOIN
zero_posts P
ON
T.id = P.topic_id
LEFT JOIN
zero_players U
ON
T.author = U.uuid
LEFT JOIN
zero_players U2
ON
P.author = U2.uuid
ORDER BY
CASE
WHEN P.id is null THEN T.ID
ELSE P.ID
END ASC
You actually have two join chains from the topics table. One chain ties an author directly to the topic and one ties an author to each post about the topic, either one or both may be left joined. But once you start a left join in a chain, it must then be continued down the rest of the chain or you nullify the left join. Actually, the topic author is in a chain of length 1 so you don't have to worry about that one.
If every topic has an author, you don't need to left join the first players table (T.author = U.uuid) as that would always link. You would left join down the post chain to see topics even if they have no posts written on them.
Assuming that is what you want to see, then the order by clause could well stay just as you wrote it. What you would get is a list of posts, ordered by ID, with the topics scattered around however they ended up. Any topics with no posts would be clumped all either at the beginning or at the end of the result set, depending on your settings and the DBMS.
If, however, you wrote the order by like this:
order by t.Title, p.id;
Then you would get all the topic ordered by title, with the posts written about that topic ordered by ID within each topic. Any topic with no posts would have a single row (assuming only one topic author) in the proper title order but showing only topic data.
So it all depends on what you want to see.

Mysql forum - get number of replies

I'm making a very simple forum with one table called forum_posts. I store both the replies and the posts in that same table because I've found that works pretty well for comments system I made before.
If the post is a reply, it has a reply_id that is the post_id of the post it is replying to. If it is a 'root post' so to speak, it has a 0 for the reply_id.
I already have the number of views. But I'd like to get the number of replies for each record in the results.
How would I do that?
SELECT a.account_id, a.store_name, p.post_id, p.post_title, p.post_text, p.views, p.creation_timestamp, p.update_timestamp
FROM forum_posts AS p
INNER JOIN accounts AS a
ON p.account_id = a.account_id
WHERE p.reply_id > 0
As you can probably guess, I'm making the forum listings where people choose a forum post to go and view.
You will need to join posts against itself and count it to get the number of replies. This isn't particularly efficient over millions of forum posts - most forums denormalize this and maintain a post count attribute separately.
However, in keeping with what you have, something like (untested)...
SELECT a.account_id, a.store_name, cp.* FROM
(
SELECT p.post_id, p.post_title, p.post_text, p.views, p.creation_timestamp,
p.update_timestamp, p.account_id, (COUNT(*) - 1) as replies
FROM forum_posts AS p
LEFT JOIN forum_posts AS p1 ON p1.reply_id > 0 AND p1.reply_id = p.post_id
GROUP BY p.post_id
)
AS cp
INNER JOIN accounts AS a ON cp.account_id = a.account_id
WHERE cp.replies > 0
The end WHERE is optional. I just copied it from your first query. It's also worth noting that this is MySQL specific as it uses the GROUP BY without a full list of non-aggregated columns (but you tagged your question as MySQL so no problem).
SELECT account_id, store_name, post_id, post_title, post_text, views, creation_timestamp, update_timestamp, IF(reply_id IS NOT NULL, replies, 0)
FROM (
SELECT a.account_id, a.store_name, p.post_id, p.post_title, p.post_text, p.views, p.creation_timestamp, p.update_timestamp, r.reply_id, COUNT(*) as replies
FROM forum_posts AS p
INNER JOIN accounts AS a
ON p.account_id = a.account_id
LEFT JOIN forum_posts AS r
ON p.post_id = r.reply_id
WHERE p.reply_id = 0
GROUP BY p.post_id
) AS sq
It also seems that you need p.reply_id = 0 if you want to show only "root" posts (forum threads) rather than replies.
Outer query makes sure that the posts with no replies (which are still returned in the inner query) are printed with the number of replies 0 rather than 1 (which would be incorrect obviously).

MYSQL subquery SELECT in JOIN clause

Ok... well I have to put the subquery in a JOIN clause since it selects more than one column and putting it in the SELECT clause does not allow that as it gives me an error of an operand.
Anywho, this is my query:
SELECT
c.id,
c.title,
c.description,
c.icon,
p.id as topic_id,
p.title AS topic_title,
p.date,
p.username
FROM forum_cat c
LEFT JOIN (
SELECT
ft.id,
ft.cat_id,
ft.title,
fp.date,
u.username
FROM forum_topic ft
JOIN forum_post fp ON fp.topic_id = ft.id
JOIN user u ON u.user_id = fp.author_id
WHERE ft.cat_id = c.id
ORDER BY fp.date DESC
LIMIT 1
) p ON p.cat_id = c.id
WHERE c.main_cat = ?
ORDER BY c.list_no
Now the important thing I need here... FOR EACH category, I want to show the latest post and topic title in each category.
However, this select statement is going INSIDE a foreach loop looping around the general categories which is found my main_cat.
So there are 5 main categories with 3-8 subcategories.. this is the subcategory query. BUT FOR EACH subcategory, I need to grab the latest post.. However, it only runs this SELECT query for each main category so it's only select THE LATEST post between all subcategories combined... I want to get the latest post of EACH subcategory, but I rather not run this query for each subcategory... since I want the page load to be fast.
BUT REMEMBER, some subcategories WILL NOT have a latest post since some of them may not even contain a topic yet! So hence the left join.
Does anyone know how to go about doing this?
AND BTW, there is an error it gives me (WHERE ft.cat_id = c.id) in the subquery because c.id is an unknown column. But I'm trying to reference it from the outer query so can someone help me on that issue as well?
Thank you!
All tables:
forum_cat (Subcategories)
-----------------------------------------------
ID, Title, Description, Icon, Main_cat, List_no
forum_topic (Topics in each subcategory)
--------------------------------------------
ID, Author_id, Cat_id, Title, Sticky, Locked
forum_post (Posts in each topic)
--------------------------------------------
ID, Topic_id, Author_id, Body, Date, Hidden'
The main categories are listed in a function. I didn't store them in the database since it was a waste of space since they never change. There are 7 main categories though.
It's hard to tell without seeing DDL of your tables, relevant sample data and desired output.
I could've got your requirements wrong, but try this:
SELECT *
FROM forum_cat c LEFT JOIN
(SELECT t.cat_id,
p.topic_id,
t.title,
p.id,
p.body,
MAX(p.`date`) AS `date`,
p.author_id,
u.username
FROM forum_post p INNER JOIN
forum_topic t ON t.id = p.topic_id INNER JOIN
`user` u ON u.user_id = p.author_id
GROUP BY t.cat_id) d ON d.cat_id = c.id
WHERE c.main_cat = 1
ORDER BY c.list_no

How to left join multiple one to many tables in mysql?

i have a problem with joining three tables in mysql.
lets say we have a table named posts which I keep my entries in it, i have a table named likes which i store user_id's and post_id's in and a third table named comments which i store user_id's and post_id's and comment's text in it.
I need a query that fetches list of my entries, with number of likes and comments for each entry.
Im using this query:
SELECT posts.id, count(comments.id) as total_comments, count(likes.id) as total_likes
FROM `posts`
LEFT OUTER JOIN comments ON comments.post_id = posts.id
LEFT OUTER JOIN likes ON likes.post_id = posts.id
GROUP BY posts.id
but there is a problem with this query, if comments are empty for an item, likes count is just ok, but lets say if an entry has 2 comments and 4 likes, both total_comments and total_likes will be "8", meaning that mysql multiplies them.
I'm confused and I dont know what whould I do.
Thanks in advace.
Use count(distinct comments.id) and count(distinct likes.id), provided these ids are unique.
Well this is one way to approach it (assuming mysql allows derived tables):
SELECT posts.id, comments.total_comments, likes.total_likes
FROM `posts`
LEFT OUTER JOIN (select post_id, count(id) as total_comments from comments) comments
ON comments.post_id = posts.id
LEFT OUTER JOIN (select post_id, count(id) as total_likes from likes) likes
ON likes.post_id = posts.id
You could also use correlated subqueries. You may want a case statment inthere to account for putting in a 0 when there are no matched records.
Let's try a correlated subquery:
SELECT posts.id,
(select count(Id) from comments where post_id = posts.id) as total_comments,
(select count(Id) from likes where post_id = posts.id) as total_likes
FROM `posts`

Get the latest row from another table in MySQL

Let's say I have two tables, news and comments.
news (
id,
subject,
body,
posted
)
comments (
id,
parent, // points to news.id
message,
name,
posted
)
I would like to create one query that grabs the latest x # of news item along with the name and posted date for the latest comment for each news post.
Speed matters in terms of selecting ALL the comments in a subquery is not an option.
I just realized the query does not return results if there are no comments attached to the news table, here's the fix as well as an added column for the total # of posts:
SELECT news.*, comments.name, comments.posted, (SELECT count(id) FROM comments WHERE comments.parent = news.id) AS numComments
FROM news
LEFT JOIN comments
ON news.id = comments.parent
AND comments.id = (SELECT max(id) FROM comments WHERE parent = news.id)
If speed is that important, why not create a recent_comment table that contains the id and parent id of just the most recent comments? Every time a comment is posted on a news post, replace that news id's most recent comment id. Create an index on the news id column of the new table and your joins will be fast.You'd be trading write speed for read speed, but not by a whole lot.
Assuming posted is a unique timestamp, otherwise choose a unique autonumber
select c.id, c.parent, c.message, c.name, c.posted
c.message, c.name,
c.posted -- same as comment_latest.recent
from comments c
join
(
select parent, max(posted) as recent
from comments
group by parent
) as comment_latest
on c.parent = comment_latest.parent
and c.posted = comment_latest.recent
Complete(displays news information):
select
n.id as news_id, n.subject, n.body, n.posted as news_posted_date
c.id as comment_id,
c.message, c.name as commenter_name, c.posted as comment_posted_date
from comments c
join
(
select r.parent, max(r.posted) as recent
from comments r
join
(
select id from news order by id desc limit $last_x_news
) news l
on r.parent = l.id
group by r.parent
) as comment_latest
on c.parent = comment_latest.parent
and c.posted = comment_latest.recent
join news n on c.parent = n.id
NOTE:
The above code is not subquery, it is table-deriving query. It is faster than subquery. This is subquery(slow):
select
id,
subject,
body,
posted as news_posted_date,
(select id from comments where parent = news.id order by posted desc limit 1) as comment_id,
(select message from comments where parent = news.id order by posted desc limit 1) as message,
(select name from comments where parent = news.id order by posted desc limit 1) as name,
(select posted from comments where parent = news.id order by posted desc limit 1) as comment_posted_date,
from news
SELECT news.subject, news.body, comments.name, comments.posted
FROM news
INNER JOIN comments ON
(comments.parent = news.id)
WHERE comments.parent = news.id
AND comments.id = (SELECT MAX(id)
FROM comments
WHERE parent = news.id)
ORDER BY news.id
This gets all the news items, along with the related comment with the highest id value, which in theory should be the latest.
My solution is similar to J but I think he added one line that is unnecessary:
SELECT news.*, comments.name, comments.posted FROM news INNER JOIN comments ON news.id = comments.parent WHERE comments.id = (SELECT max(id) FROM comments WHERE parent = news.id )
Not sure of the speed on an extremely large table though.
Given the constraints brought to light in the comments of my other answer, I have a new idea that may or may not make any sense in practise.
Create a view (or function if it's more appropriate) with the following definition, called recent_comments:
SELECT MAX(id), parent
FROM comments
GROUP BY parent
If you have a clustered index on the parent column, this is probably a reasonably fast query, but even then it will still be a bottleneck.
Using this, the query you need to get your answer is something like,
SELECT news.*, comments.*
FROM news
INNER JOIN recent_comments
ON news.id = recent_comments.parent
INNER JOIN comments
ON comments.id = recent_comments.id
Plus considerations for news posts that don't have any comments yet.
I think the solution provided by #Jan is the best. i.e create the "View" and inner join it with the SQL statement.
It'll definitely reduce the time to pull the data. I tested it and it works 100%.