Let's say I have two tables, news and comments.
news (
id,
subject,
body,
posted
)
comments (
id,
parent, // points to news.id
message,
name,
posted
)
I would like to create one query that grabs the latest x # of news item along with the name and posted date for the latest comment for each news post.
Speed matters in terms of selecting ALL the comments in a subquery is not an option.
I just realized the query does not return results if there are no comments attached to the news table, here's the fix as well as an added column for the total # of posts:
SELECT news.*, comments.name, comments.posted, (SELECT count(id) FROM comments WHERE comments.parent = news.id) AS numComments
FROM news
LEFT JOIN comments
ON news.id = comments.parent
AND comments.id = (SELECT max(id) FROM comments WHERE parent = news.id)
If speed is that important, why not create a recent_comment table that contains the id and parent id of just the most recent comments? Every time a comment is posted on a news post, replace that news id's most recent comment id. Create an index on the news id column of the new table and your joins will be fast.You'd be trading write speed for read speed, but not by a whole lot.
Assuming posted is a unique timestamp, otherwise choose a unique autonumber
select c.id, c.parent, c.message, c.name, c.posted
c.message, c.name,
c.posted -- same as comment_latest.recent
from comments c
join
(
select parent, max(posted) as recent
from comments
group by parent
) as comment_latest
on c.parent = comment_latest.parent
and c.posted = comment_latest.recent
Complete(displays news information):
select
n.id as news_id, n.subject, n.body, n.posted as news_posted_date
c.id as comment_id,
c.message, c.name as commenter_name, c.posted as comment_posted_date
from comments c
join
(
select r.parent, max(r.posted) as recent
from comments r
join
(
select id from news order by id desc limit $last_x_news
) news l
on r.parent = l.id
group by r.parent
) as comment_latest
on c.parent = comment_latest.parent
and c.posted = comment_latest.recent
join news n on c.parent = n.id
NOTE:
The above code is not subquery, it is table-deriving query. It is faster than subquery. This is subquery(slow):
select
id,
subject,
body,
posted as news_posted_date,
(select id from comments where parent = news.id order by posted desc limit 1) as comment_id,
(select message from comments where parent = news.id order by posted desc limit 1) as message,
(select name from comments where parent = news.id order by posted desc limit 1) as name,
(select posted from comments where parent = news.id order by posted desc limit 1) as comment_posted_date,
from news
SELECT news.subject, news.body, comments.name, comments.posted
FROM news
INNER JOIN comments ON
(comments.parent = news.id)
WHERE comments.parent = news.id
AND comments.id = (SELECT MAX(id)
FROM comments
WHERE parent = news.id)
ORDER BY news.id
This gets all the news items, along with the related comment with the highest id value, which in theory should be the latest.
My solution is similar to J but I think he added one line that is unnecessary:
SELECT news.*, comments.name, comments.posted FROM news INNER JOIN comments ON news.id = comments.parent WHERE comments.id = (SELECT max(id) FROM comments WHERE parent = news.id )
Not sure of the speed on an extremely large table though.
Given the constraints brought to light in the comments of my other answer, I have a new idea that may or may not make any sense in practise.
Create a view (or function if it's more appropriate) with the following definition, called recent_comments:
SELECT MAX(id), parent
FROM comments
GROUP BY parent
If you have a clustered index on the parent column, this is probably a reasonably fast query, but even then it will still be a bottleneck.
Using this, the query you need to get your answer is something like,
SELECT news.*, comments.*
FROM news
INNER JOIN recent_comments
ON news.id = recent_comments.parent
INNER JOIN comments
ON comments.id = recent_comments.id
Plus considerations for news posts that don't have any comments yet.
I think the solution provided by #Jan is the best. i.e create the "View" and inner join it with the SQL statement.
It'll definitely reduce the time to pull the data. I tested it and it works 100%.
Related
I have two queries below. The first one has a nested select. The second one makes use of a group by clause.
select
posts.*,
(select count(*) from comments where comments.post_id = posts.id and comments.is_approved = 1) as comments_count
from
posts
select
posts.*,
count(comments.id) comments_count
from
posts
left join comments on
comments.post_id = posts.id
group by
posts.*
From my understanding the first query is worse because it has to do a select for each record in posts where as the second query does not.
Is this true or false?
As with all performance questions, you should test the performance on your system with your data.
However, I would expect the first to perform better, with the right indexes. The right index for:
select p.*,
(select count(*)
from comments c
where c.post_id = p.id and c.is_approved = 1
) as comments_count
from posts p
is comments(post_id, is_approved).
MySQL implements a group by by doing a file sort. This version saves a file sort on all the data. My guess is that will be faster than the second method.
As a note: group by posts.* is not valid syntax. I assume this was intended for illustration purposes only.
This is the standard way I would do it (the use of LEFT JOIN, and SUM lets you also know which posts have no comments.)
SELECT posts.*
, SUM(IF(comments.id IS NULL, 0, 1)) AS comments_count
FROM posts
LEFT JOIN comments USING (post_id)
GROUP BY posts.post_id
;
But if I were trying for faster, this might be better.
SELECT posts.*, IFNULL(subQ.comments_count, 0) AS comments_count
FROM posts
LEFT JOIN (
SELECT post_id, COUNT(1) AS comments_count
FROM comments
GROUP BY post_id
) As subQ
USING (post_id)
;
After a bit more research I found no time difference between the two queries
Benchmark.bm do |b|
b.report('joined') do
1000.times do
ActiveRecord::Base.connection.execute('
select
p.id,
(select count(c.id) from comments c where c.post_id = p.id) comment_count
from
posts l;')
end
end
b.report('nested') do
1000.times do
ActiveRecord::Base.connection.execute('
select
p.id,
count(c.id) comment_count
from
posts File.join(File.dirname(__FILE__), *%w[rel path here])
left join comments c on
c.post_id = p.id
group by
p.id;')
end
end
end
user system total real
nested 2.120000 0.900000 3.020000 ( 3.349015)
joined 2.110000 0.990000 3.100000 ( 3.402986)
However I did notice that when running an explain for both queries, more indexes are possible in the first query. Which makes me think it is a better option if the attributes needed in the select changed.
SQL Query:
SELECT
T.*,
U.nick AS author_nick,
P.id AS post_id,
P.name AS post_name,
P.author AS post_author_id,
U2.nick AS post_author
FROM
zero_topics T
LEFT JOIN
zero_posts P
ON
T.id = P.topic_id
LEFT JOIN
zero_players U
ON
T.author = U.uuid
LEFT JOIN
zero_players U2
ON
P.author = U2.uuid
ORDER BY
P.id DESC
Questions:
I need to double left join to get user nick from UUID for topic and post
Not all topics will have post, as you see i sort from post id(it will be date) but it shows on first place topics with last post, and on bottom topics without replies, how can i define order when posts doesn't exists?
1.You will need to double left join if you need to show the nicks in different columns
2.You could use a case in you order by
ORDER BY
CASE
WHEN P.id is null THEN T.ID
ELSE P.ID
END ASC
Final Query:-
SELECT
T.*,
U.nick AS author_nick,
P.id AS post_id,
P.name AS post_name,
P.author AS post_author_id,
U2.nick AS post_author
FROM
zero_topics T
LEFT JOIN
zero_posts P
ON
T.id = P.topic_id
LEFT JOIN
zero_players U
ON
T.author = U.uuid
LEFT JOIN
zero_players U2
ON
P.author = U2.uuid
ORDER BY
CASE
WHEN P.id is null THEN T.ID
ELSE P.ID
END ASC
You actually have two join chains from the topics table. One chain ties an author directly to the topic and one ties an author to each post about the topic, either one or both may be left joined. But once you start a left join in a chain, it must then be continued down the rest of the chain or you nullify the left join. Actually, the topic author is in a chain of length 1 so you don't have to worry about that one.
If every topic has an author, you don't need to left join the first players table (T.author = U.uuid) as that would always link. You would left join down the post chain to see topics even if they have no posts written on them.
Assuming that is what you want to see, then the order by clause could well stay just as you wrote it. What you would get is a list of posts, ordered by ID, with the topics scattered around however they ended up. Any topics with no posts would be clumped all either at the beginning or at the end of the result set, depending on your settings and the DBMS.
If, however, you wrote the order by like this:
order by t.Title, p.id;
Then you would get all the topic ordered by title, with the posts written about that topic ordered by ID within each topic. Any topic with no posts would have a single row (assuming only one topic author) in the proper title order but showing only topic data.
So it all depends on what you want to see.
I've been stuck the last couple of days with this problem. I have the users' tags (user_tags) in one table and the articles in another table (crawler_results), and I'm trying to sort the articles so the ones which contain the most tags are placed at first.
How do you do that?
This is my code:
SELECT content
FROM crawler_results articles
INNER JOIN user_newspapers un
ON un.newspaper_id = articles.newspaper_id
WHERE un.user_id = '$user_id'
I somehow have to join the user_tags table which consists of:
tag and user_id
EDIT:
Lets say the user wants to find articles with these tags: #stack and #overflow. The articles that have the most of these tags are displayed first. So an article that says: "Stack overflow is great" is ranked higher than just: "overflow is great". And tags can be reused, for instance: "overflow, overflow, overflow" counts as 3.
I want to sort the articles so the ones that contain the most tags are first.
How is this achieved? I have four tables: crawler_results (where I store the articles), newspapers (where I store the newspapers), user_newspapers (where I have newspaper_id and user_id) and user_tags (where I have the tag and user_id).
Do I need to make a separate table that stores all the words from the articles or can this be done with a sql query?
I really appreciate your help!
If you just have to sort your query using the number of tags related, try this query:
SELECT title
, c.name AS category
, img_src
, n.name AS newspaper_name
, crawler_results.id AS id
, crawler_results.content AS content
FROM crawler_results
INNER JOIN newspapers n ON n.id = crawler_results.newspaper_id
INNER JOIN categories c ON c.id = crawler_results.category
INNER JOIN user_newspapers un ON un.newspaper_id = n.id
INNER JOIN (SELECT ut.user_id
,COUNT(ut.tag) AS nb_tags
FROM user_tags ut
GROUP BY ut.user_id) t ON t.user_id = un.user_id
INNER JOIN user_categories uc ON uc.category_id = c.id
WHERE img_src != ''
AND char_length(content) > 750
AND active = '1'
AND un.user_id = '$user_id'
GROUP BY title
ORDER BY category ASC, t.nb_tags DESC
If the result is not what you expect, try to clarify your question and eventually provide some example of data you're expecting.
Hope this will help.
I am currently trying to retrieve the latest posts along with their related posts (x number for each post). I have the following query in hand:
SELECT id, title, content
(SELECT GROUP_CONCAT(title) FROM posts -- Select title of related posts
WHERE id <> p.id AND id IN (
SELECT p_id FROM tagsmap -- Select reletad post ids from tagsmap
WHERE t_id IN (
SELECT t_id FROM tagsmap -- Select the tags of the current post
WHERE p_id = p.id)
) ORDER BY id DESC LIMIT 0, 3) as related
FROM posts as p ORDER BY id DESC LIMIT 5
My database structure is simple: A posts table. A tags table. And a tagsmap table where I associate posts with tags.
This query works fine (though I don't know its performance since I don't have many rows in the tables -- Maybe an explain could help me but that's not the case right now).
What I really need is to retrieve the ids of the related posts along with their titles.
So I'd like to do SELECT GROUP_CONCAT(title), GROUP_CONCAT(id), but I know that will result in an error. So what is the best way to retrieve the id along with the title in this case? I do not want to rewrite the whole subquery to just retrieve the id. There should be another way.
EDIT
SELECT p1.id, p1.title, p1.content,
group_concat(DISTINCT p2.id) as 'P IDs',
group_concat(DISTINCT p2.title) as 'P titles'
FROM posts as p1
LEFT JOIN tagsmap as tm1 on tm1.p_id = p1.id
LEFT JOIN tagsmap as tm2 on tm2.t_id = tm1.t_id and tm1.p_id <> tm2.p_id
LEFT JOIN posts as p2 on p2.id = tm2.p_id
GROUP BY p1.id
ORDER BY p1.id desc limit 5;
At the end this is the query that I've used. I removed the Where clause because it is unnecessary and used LEFT JOIN rather that JOIN because otherwise it would ignore the posts without tags. And finally added DISTINCT to group_concat because it was concatenating duplicate rows (If for example a post had multiple common tags with a related post it would result in a duplicate concatenation).
The query above works perfectly. Thanks for all.
Okay - this will work, and it has the added advantage of eliminating the sub queries (which can slow you down when you get lots of records):
SELECT p1.id, p1.title, p1.content,
group_concat( p2.id) as 'P IDs',
group_concat( p2.title) as 'P titles'
FROM posts as p1
JOIN tagsmap as tm1 on tm1.p_id = p1.id
JOIN tagsmap as tm2 on tm2.t_id = tm1.t_id and tm1.p_id <> tm2.p_id
JOIN posts as p2 on p2.id = tm2.p_id
WHERE p2.id <> p1.id
GROUP BY p1.id
ORDER BY p1.id desc limit 5;
What we're doing here is selecting what you want from the first version of posts, joining them to the tagsmap by their post.id, doing a self join to tagsmap by tag id to get all the related tags, and then joining back to another posts (p2) to get the posts that are pointed to by those related tags.
Use GROUP BY to discard the dups from all that joining, and you're there.
like this?
SELECT id, title, content
(SELECT GROUP_CONCAT(concat(cast(id as varchar(10)), ':', title)) FROM posts -- Select title of related posts
WHERE id <> p.id AND id IN (
SELECT p_id FROM tagsmap -- Select reletad post ids from tagsmap
WHERE t_id IN (
SELECT t_id FROM tagsmap -- Select the tags of the current post
WHERE post_id = p.id)
) ORDER BY id DESC LIMIT 0, 3) as related
FROM posts as p ORDER BY id DESC LIMIT 5
Ok... well I have to put the subquery in a JOIN clause since it selects more than one column and putting it in the SELECT clause does not allow that as it gives me an error of an operand.
Anywho, this is my query:
SELECT
c.id,
c.title,
c.description,
c.icon,
p.id as topic_id,
p.title AS topic_title,
p.date,
p.username
FROM forum_cat c
LEFT JOIN (
SELECT
ft.id,
ft.cat_id,
ft.title,
fp.date,
u.username
FROM forum_topic ft
JOIN forum_post fp ON fp.topic_id = ft.id
JOIN user u ON u.user_id = fp.author_id
WHERE ft.cat_id = c.id
ORDER BY fp.date DESC
LIMIT 1
) p ON p.cat_id = c.id
WHERE c.main_cat = ?
ORDER BY c.list_no
Now the important thing I need here... FOR EACH category, I want to show the latest post and topic title in each category.
However, this select statement is going INSIDE a foreach loop looping around the general categories which is found my main_cat.
So there are 5 main categories with 3-8 subcategories.. this is the subcategory query. BUT FOR EACH subcategory, I need to grab the latest post.. However, it only runs this SELECT query for each main category so it's only select THE LATEST post between all subcategories combined... I want to get the latest post of EACH subcategory, but I rather not run this query for each subcategory... since I want the page load to be fast.
BUT REMEMBER, some subcategories WILL NOT have a latest post since some of them may not even contain a topic yet! So hence the left join.
Does anyone know how to go about doing this?
AND BTW, there is an error it gives me (WHERE ft.cat_id = c.id) in the subquery because c.id is an unknown column. But I'm trying to reference it from the outer query so can someone help me on that issue as well?
Thank you!
All tables:
forum_cat (Subcategories)
-----------------------------------------------
ID, Title, Description, Icon, Main_cat, List_no
forum_topic (Topics in each subcategory)
--------------------------------------------
ID, Author_id, Cat_id, Title, Sticky, Locked
forum_post (Posts in each topic)
--------------------------------------------
ID, Topic_id, Author_id, Body, Date, Hidden'
The main categories are listed in a function. I didn't store them in the database since it was a waste of space since they never change. There are 7 main categories though.
It's hard to tell without seeing DDL of your tables, relevant sample data and desired output.
I could've got your requirements wrong, but try this:
SELECT *
FROM forum_cat c LEFT JOIN
(SELECT t.cat_id,
p.topic_id,
t.title,
p.id,
p.body,
MAX(p.`date`) AS `date`,
p.author_id,
u.username
FROM forum_post p INNER JOIN
forum_topic t ON t.id = p.topic_id INNER JOIN
`user` u ON u.user_id = p.author_id
GROUP BY t.cat_id) d ON d.cat_id = c.id
WHERE c.main_cat = 1
ORDER BY c.list_no