SELECT p.*,
u.user_id,
u.user_name,
count(c.comment_post_id) AS comments
FROM posts AS p
LEFT JOIN comments AS c
ON (p.post_id = c.comment_post_id)
LEFT JOIN users AS u
ON (p.postedby_id = u.user_id)
WHERE c.comment_added > NOW() - INTERVAL 1 HOUR
GROUP BY p.post_id
ORDER BY count(c.comment_post_id) DESC
This makes a list of posts sorted by most comments the last hour. The problem now is that the posts with 0 comments last hour does not make the list.
So is there a way to make it ordered by average comment per hour since the comment was created? This way all the posts will make the list, and the most discussed posts will always be on top.
http://sqlfiddle.com/#!9/820044/1/0
EDIT:
SELECT p.*, u.user_id, u.user_name, COUNT(c.comment_post_id) / (TIMESTAMPDIFF(HOUR, p.post_timestamp, SYSDATE()) + 1) AS rate FROM posts AS p LEFT JOIN comments AS c ON (p.post_id = c.comment_post_id) LEFT JOIN users AS u ON (p.postedby_id = u.user_id) GROUP BY p.post_id ORDER BY COUNT(c.comment_post_id) / (TIMESTAMPDIFF(HOUR, p.post_timestamp, SYSDATE()) + 1) DESC
This is what solved the problem.
You select only posts that have comments within one hour, because of the WHERE clause. What you can try is:
SELECT p.*,
u.user_id,
u.user_name,
SUM(c.comment_added > NOW() - INTERVAL 1 HOUR) AS comments
FROM posts AS p
LEFT JOIN comments AS c
ON (p.post_id = c.comment_post_id)
LEFT JOIN users AS u
ON (p.postedby_id = u.user_id)
GROUP BY p.post_id
ORDER BY
COUNT(c.comment_post_id) /
(TIMESTAMPDIFF(HOUR, p.post_timestamp, SYSDATE()) + 1) DESC
For each comment that is within the hour the boolean expression c.comment_added > NOW() - INTERVAL 1 HOUR will contribute one to the SUM(...).
For the second part, we can find how many hours passed from publishing a post with TIMESTAMPDIFF(HOUR, p.post_timestamp, SYSDATE()). So dividing the number of comments with the number of hours, gives us the sorting criterion. But we add one to the hours to avoid division by zero.
Bit hard to see without the data, but to solve your immediate problem (i.e. get the zero-recent-comments posts showing), don't you just want to move your constraint into the LEFT JOIN?
LEFT JOIN comments AS c
ON (p.post_id = c.comment_post_id AND c.comment_added > NOW - INTERVAL 1 HOUR)
Related
I'm currently creating a small application where users can post a text which can be commented and the post can also be voted (+1 or -1).
This is my database:
Now I want to select all information of all posts with status = 1 plus two extra columns: One column containing the count of comments and one column containing the sum (I call it score) of all votes.
I currently use the following query, which correctly adds the count of the comments:
SELECT *, COUNT(comments.fk_commented_post) as comments
FROM posts
LEFT JOIN comments
ON posts.id_post = comments.fk_commented_post
AND comments.status = 1
WHERE posts.status = 1
GROUP BY posts.id_post
Then I tried to additionally add the sum of the votes, using the following query:
SELECT *, COUNT(comments.fk_commented_post) as comments, SUM(votes_posts.type) as score
FROM posts
LEFT JOIN comments
ON posts.id_post = comments.fk_commented_post
AND comments.status = 1
LEFT JOIN votes_posts
ON posts.id_post = votes_posts.fk_voted_post
WHERE posts.status = 1
GROUP BY posts.id_post
The result is no longer correct for either the votes or the comments. Somehow some of the values seem to be getting multiplied...
This is probably simpler using correlated subqueries:
select p.*,
(select count(*)
from comments c
where c.fk_commented_post = p.id_post and c.status = 1
) as num_comments,
(select sum(vp.type)
from votes_posts vp
where c.fk_voted_post = p.id_post
) as num_score
from posts p
where p.status = 1;
The problem with join is that the counts get messed up because the two other tables are not related to each tother -- so you get a Cartesian product.
You want to join comments counts and votes counts to the posts. So, aggregate to get the counts, then join.
select
p.*,
coalesce(c.cnt, 0) as comments,
coalesce(v.cnt, 0) as votes
from posts p
left join
(
select fk_commented_post as id_post, count(*) as cnt
from comments
where status = 1
group by fk_commented_post
) c on c.id_post = p.id_post
left join
(
select fk_voted_post as id_post, count(*) as cnt
from votes_posts
group by fk_voted_post
) v on v.id_post = p.id_post
where p.status = 1
order by p.id_post;
I am implementing a voting system with MySQL and Node.js, it's working good for now, but there is 1 question. I have a table articles with 2 relations upvotes and downvotes.
If I fetch all my articles, I'd like to have the count of upvotes and downvotes. First table is working with
SELECT articles.*, count(downvotes.articles_id)
as downvotes
from articles
left join downvotes
on (articles.id = downvotes.articles_id)
where articles.communities_id = '52'
group by articles.id
ORDER BY created_at
DESC [![Sequel Pro][1]][1]
How can I add the upvotes in the query too?
Thanks!
matz
Same way you added the downvotes. Also, make sure you get into the habit of formatting your SQL, makes it far easier to read and debug.
SELECT
articles.*,
COUNT(downvotes.articles_id) AS downvotes,
COUNT(upvotes.articles_id) AS upvotes
FROM
articles
LEFT JOIN
downvotes ON (articles.id = downvotes.articles_id)
LEFT JOIN
upvotes ON (articles.id = upvotes.articles_id)
WHERE
articles.communities_id = '52'
GROUP BY articles.id
ORDER BY created_at DESC
Add another left join for upvotes
SELECT articles.*, count(downvotes.articles_id) as downvotes
from articles
left join downvotes on (articles.id = downvotes.articles_id)
left join upvotes on (articles.id = upvotes.articles_id)
where articles.communities_id = '52'
group by articles.id
ORDER BY created_at DESC
Okay, a friend posted the correct answer:
SELECT a.id,
(SELECT COUNT(*) FROM downvotes d WHERE a.id=d.articles_id) AS `downs`,
(SELECT COUNT(*) FROM upvotes u WHERE a.id=u.articles_id) AS `ups`
FROM articles a
ORDER BY a.id ASC
He told me the problem is the group at the end. The results from the first left join will be overwritten with the second left join.
SELECT p.*, u.user_id, u.user_name,
COUNT(c.comment_id) AS count,
COUNT(v.vote_post_id) /
(TIMESTAMPDIFF(MINUTE, v.vote_timestamp, SYSDATE()) + 1) AS rate
FROM posts AS p
LEFT JOIN comments AS c ON (p.post_id = c.comment_post_id)
LEFT JOIN post_votes AS v ON (p.post_id = v.vote_post_id)
LEFT JOIN users AS u ON (p.postedby_id = u.user_id)
GROUP BY p.post_id
ORDER BY COUNT(v.vote_post_id) /
(TIMESTAMPDIFF(MINUTE, v.vote_timestamp, SYSDATE()) + 1) DESC
This is the script I'm working on. I don't have my db filled up very well for testing, but the first two results gets double up with comments. Can you see any obvious mistakes here? I have another version of the script that works fine here:
SELECT p.*, u.user_id, u.user_name,
COUNT(c.comment_id) AS count
FROM posts AS p
LEFT JOIN comments AS c ON (p.post_id = c.comment_post_id)
LEFT JOIN users AS u ON (p.postedby_id = u.user_id)
GROUP BY p.post_id
ORDER BY COUNT(c.comment_post_id) /
(TIMESTAMPDIFF(MINUTE, p.post_timestamp, SYSDATE()) + 1) DESC
Multiple votes will cause your comments to duplicate. You want to do a sub-select on the post_votes table to get the total votes per post as a single value if you GROUP BY the vote_post_id.
Since COUNT is a reserved word, I don't recommend using it as a column name in your result set.
If you're just getting the comment count and not the comments themselves, then you'll want that in a sub-select, too, or you'll be doubling up on posts.
SELECT p.*, u.user_id, u.user_name, c.comment_count,
v.vote_count AS total_votes,
v.vote_count / (TIMESTAMPDIFF(MINUTE, p.post_timestamp, SYSDATE()) + 1) as votes_per_minute
FROM posts AS p
LEFT JOIN (SELECT comment_post_id, COUNT(comment_post_id) AS comment_count FROM comments GROUP BY comment_post_id) AS c ON (p.post_id = c.comment_post_id)
LEFT JOIN (SELECT vote_post_id, COUNT(vote_post_id) AS vote_count FROM post_votes GROUP BY vote_post_id) AS v ON (p.post_id = v.vote_post_id)
LEFT JOIN users AS u ON (p.postedby_id = u.user_id) GROUP BY p.post_id
ORDER BY v.vote_count / (TIMESTAMPDIFF(MINUTE, p.post_timestamp, SYSDATE()) + 1) DESC
Clearly you are using MYSQL becasue it is the only database that allows this type of group by. It is a bad choice 100% of the time to use it however. You should group by all the fields in the select that are not part of the aggregate functions the way all other databases require. THis is how a group by needs to work. This may clear your problem. It may not depending on the data. If you have multiple records in some of the joined tables and they happen to have differnt data in the some of the fields, you still may not get one record when you properly group. In that case, you need to write a derived table for the join or use an aggregate on the field that has more than one value to tell the database which value to use.
I have many tables that log the users action on some forum, each log event has it's date.
I need a query that gives me all the users that wasn't active in during the last year.
I have the following query (working query):
SELECT *
FROM (questions AS q
INNER JOIN Answers AS a
INNER JOIN bestAnswerByPoll AS p
INNER JOIN answerThumbRank AS t
INNER JOIN notes AS n
INNER JOIN interestingQuestion AS i ON q.user_id = a.user_id
AND a.user_id = p.user_id
AND p.user_id = t.user_id
AND t.user_id = n.user_id
AND n.user_id = i.user_id)
WHERE DATEDIFF(CURDATE(),q.date)>365
AND DATEDIFF(CURDATE(),a.date)>365
AND DATEDIFF(CURDATE(),p.date)>365
AND DATEDIFF(CURDATE(),t.date)>365
AND DATEDIFF(CURDATE(),n.date)>365
AND DATEDIFF(CURDATE(),i.date)>365
what i'm doing in that query - joining all the tables according to the userId, and then checking each
date column individually to see if it's been more then a year
I was wondering if there is a way to make it simpler, something like finding the max between all dates (the latest date) and compering just this one to the current date
If you want to get best performance, you cannot use greatest(). Instead do something like this:
SELECT *
FROM questions q
JOIN Answers a ON q.user_id = a.user_id
JOIN bestAnswerByPoll p ON a.user_id = p.user_id
JOIN answerThumbRank t ON p.user_id = t.user_id
JOIN notes n ON t.user_id = n.user_id
JOIN interestingQuestion i ON n.user_id = i.user_id
WHERE q.date > curdate() - interval 1 year
AND a.date > curdate() - interval 1 year
AND p.date > curdate() - interval 1 year
AND t.date > curdate() - interval 1 year
AND n.date > curdate() - interval 1 year
AND i.date > curdate() - interval 1 year
You want to avoid datediff() such that MySQL can do index lookup on date column comparisons. Now, to make sure that index lookup works, you should create compound (multi-column) index on (user_id, date) for each one of your tables.
In this compound index, first part (user_id) will be user for faster joins, and second part (date) will be used for faster date comparisons. If you replace * in your SELECT * with only columns mentioned above (like user_id only), you might be able to get index-only scans, which will be super-fast.
UPDATE Unfortunately, MySQL does not support WITH clause for common table expressions like PostgreSQL and some other databases. But, you can still factor out common expression as follows:
SELECT *
FROM questions q
JOIN Answers a ON q.user_id = a.user_id
JOIN bestAnswerByPoll p ON a.user_id = p.user_id
JOIN answerThumbRank t ON p.user_id = t.user_id
JOIN notes n ON t.user_id = n.user_id
JOIN interestingQuestion i ON n.user_id = i.user_id,
(SELECT curdate() - interval 1 year AS year_ago) x
WHERE q.date > x.year_ago
AND a.date > x.year_ago
AND p.date > x.year_ago
AND t.date > x.year_ago
AND n.date > x.year_ago
AND i.date > x.year_ago
In MySQL, you can use the greatest() function:
WHERE DATEDIFF(CURDATE(), greatest(q.date, a.date, p.date, t.date, n.date, i.date)) > 365
This will help with readability. It would not affect performance.
I have this
SELECT COUNT(1) cnt, a.auther_id
FROM `posts` a
LEFT JOIN users u ON a.auther_id = u.id
GROUP BY a.auther_id
ORDER BY cnt DESC
LIMIT 20
It works fine, but now I want select posts from within the last day. I tried to use
WHERE from_unixtime(post_time) >= SUBDATE(NOW(),1)
but it didn't work. Any one have idea why?
This may work:
WHERE FROM_UNIXTIME(post_time) >= SUBDATE(NOW(), INTERVAL 1 DAY)