Sorry for the poor question title, I'm not sure of the correct terminology to describe what I'm asking.
I have two tables:
posts:
p_id | p_author | p_text
post_comments:
pc_id | pc_p_id | pc_author | pc_text
The pc_p_id for each comment corresponds to the p_id of the post.
I want to select:
The p_id and p_text for all of the posts from a specific author
The number of comments for the corresponding post
I can do the first part with a query like this (supposing "1" is an author):
SELECT p_id, p_text FROM posts WHERE p_author = 1
And I can get the number of comments for a specific post like this (supposing "12" is a post id):
SELECT COUNT(*) FROM post_comments WHERE pc_p_id = 12
My question is, how can I combine the two queries so that I get the p_id and p_text for all of the posts from a specific author, along with the number of comments for the corresponding posts?
I tried using a LEFT JOIN like the following, but it gives me a syntax error:
SELECT t1.p_id, t1.p_text, t2.COUNT(*)
FROM posts t1 LEFT JOIN post_comments t2
WHERE t1.p_author = 1 AND t2.pc_p_id = t1.p_id
ORDER BY t1.p_id DESC
Here is the SQL Fiddle:
http://sqlfiddle.com/#!2/e9b975
SELECT p_id, p_text, count(1)
FROM posts p
JOIN post_comments pc
ON p.p_id = pc.pc_p_id
WHERE p_author = 1
GROUP BY p_id, p_text
Here is the edited version that does an embedded select:
SELECT p_id, p_text,
(SELECT COUNT(1) FROM post_comments WHERE pc_p_id = p.p_id) AS count
FROM posts p
WHERE p_author = 1
I verified this in your fiddle
LEFT_JOIN is used when you want all rows from the left-hand table, but there may not be corresponding rows in the right-hand table. Otherwise you want a straight (inner) join. Also, when using an aggregation function like COUNT() you have to use a GROUP_BY clause if you have any non-aggregate columns selected.
SELECT p.p_id, p.p_text, count( pc.pc_id ) AS CountFROM posts p, post_comments pcWHERE pc.pc_p_id = p.p_idAND p.p_author =1AND pc.pc_p_id =12
Related
Currently I have the following database:
Table 1: Customer_Stores
unique_id
page_address
date_added
guide_summary
user_name
cover_photo
guide_title
Table 2: Customer_Stories_Likes
story_id
likex
The 'like' column in the second table contains a 1 or a 0 to indict whether or not a user has liked a post.
What I'd like to do is join these two tables together with 'post_id' and count all of the 'likes' for all the posts based on post_id and order these by how many likes each post got. Is this possible with a single statement? or is it better to use a Count(*) to first determine how many likes each post has?
Yes, it's possible, but you don't need an inner join, because you don't actually need the posts table to do it.
SELECT post_id, count(like) AS post_likes
FROM likes
WHERE like = 1
GROUP BY post_id
ORDER BY post_likes DESC
If you need other information from the posts table as well, you could join it to a subquery that gets the like counts.
SELECT posts.*, like_count
FROM
posts LEFT JOIN
(SELECT post_id, count(like) AS like_count
FROM likes
WHERE like = 1
GROUP BY post_id) AS post_likes
ON posts.post_id = post_likes.post_id
ORDER BY like_count DESC
I used LEFT JOIN rather than INNER JOIN, you can use INNER JOIN if you don't want to include posts with no likes.
There are two tables: posts (~5,000,000 rows) and relations (~8,000 rows).
posts columns:
-------------------------------------------------
| id | source_id | content | date (int) |
-------------------------------------------------
relations columns:
---------------------------
| source_id | user_id |
---------------------------
I wrote a MySQL query for getting 10 most recent rows from posts which are related to a specific user:
SELECT p.id, p.content
FROM posts AS p
LEFT JOIN relations AS r
ON r.source_id = p.source_id
WHERE r.user_id = 1
ORDER BY p.date DESC
LIMIT 10
However, it takes ~30 seconds to execute it.
I already have indexes at relations for (source_id, user_id), (user_id) and for (source_id), (date), (date, source_id) at posts.
EXPLAIN results:
How can I optimize the query?
Your WHERE clause renders your outer join a mere inner join (because in an outer-joined pseudo record user_id will always be null, never 1).
If you really want this to be an outer join then it is completely superfluous, because every record in posts either has or has not a match in relations of course. Your query would then be
select id, content
from posts
order by "date" desc limit 10;
If you don't want this to be an outer join really, but want a match in relations, then we are talking about existence in a table, an EXISTS or IN clause hence:
select id, content
from posts
where source_id in
(
select source_id
from relations
where user_id = 1
)
order by "date" desc
limit 10;
There should be an index on relations(user_id, source_id) - in this order, so we can select user_id 1 first and get an array of all desired source_id which we then look up.
Of course you also need an index on posts(source_id) which you probably have already, as source_id is an ID. You can even speed things up with a composite index posts(source_id, date, id, content), so the table itself doesn't have to be read anymore - all the information needed is in the index already.
UPDATE: Here is the related EXISTS query:
select id, content
from posts p
where exists
(
select *
from relations r
where r.user_id = 1
and r.source_id = p.source_id
)
order by "date" desc
limit 10;
You could put an index on the date column of the posts table, I believe that will help the order-by speed.
You could also try reducing the number of results before ordering with some additional where statements. For example if you know the that there will likely be ten records with the correct user_id today, you could limit the date to just today (or N days back depending on your actual data).
Try This
SELECT p.id, p.content FROM posts AS p
WHERE p.source_id IN (SELECT source_id FROM relations WHERE user_id = 1)
ORDER BY p.date DESC
LIMIT 10
I'd consider the following :-
Firstly, you only want the 10 most recent rows from posts which are related to a user. So, an INNER JOIN should do just fine.
SELECT p.id, p.content
FROM posts AS p
JOIN relations AS r
ON r.source_id = p.source_id
WHERE r.user_id = 1
ORDER BY p.date DESC
LIMIT 10
The LEFT JOIN is needed if you want to fetch the records which do not have a relations mapping. Hence, doing the LEFT JOIN results in a full table scan of the left table, which as per your info, contains ~5,000,000 rows. This could be the root cause of your query.
For further optimisation, consider moving the WHERE clause into the ON clause.
SELECT p.id, p.content
FROM posts AS p
JOIN relations AS r
ON (r.source_id = p.source_id AND r.user_id = 1)
ORDER BY p.date DESC
LIMIT 10
I would try with a composite index on relations :
INDEX source_user (user_id,source_id)
and change the query to this :
SELECT p.id, p.content
FROM posts AS p
INNER JOIN relations AS r
ON ( r.user_id = 1 AND r.source_id = p.source_id )
ORDER BY p.date DESC
LIMIT 10
I have two tables:
items:
| item_id | title |
comments:
| comment_id | item_id | posted_at | author_id | text |
Where posted_at is the time a comment was posted.
How can I get a list of all items, with the time each of them was last commented on and the author_id of that last comment?
For this, you don't necessarily need the 'items' table if all you want are 'item_id'.
Start by writing a query that gets the latest comment time for each item_id like this:
SELECT item_id, MAX(posted_at) AS latestComment
FROM comments
GROUP BY item_id;
Now, you can join that with your comments table on the condition that the item_id and latestComment columns match to get the latest comment author for each item:
SELECT c.item_id, c.author_id, c.posted_at
FROM comments c
JOIN(
SELECT item_id, MAX(posted_at) AS latestComment
FROM comments
GROUP BY item_id) temp ON temp.item_id = c.item_id AND temp.latestComment = c.posted_at;
If you do need any information form the items table, you can just join the above query to the items table using the item_id column to get what you need.
EDIT
If you want to add requirements for items you can join the above table, and put them in either the WHERE clause or even the ON statement of your join, like this:
SELECT c.item_id, c.author_id, c.posted_at
FROM comments c
JOIN items i ON i.item_id = c.item_id AND i.title LIKE '%Apple%'
JOIN(
SELECT item_id, MAX(posted_at) AS latestComment
FROM comments
GROUP BY item_id) temp ON temp.item_id = c.item_id AND temp.latestComment = c.posted_at;
I just made up an example requirement. This query should pull the latest comment for all items that have a title containing the word 'Apple'. Note that this is an inner join, so you will only see items that do have comments. If you want to see all items, I recommend an outer join.
You need the most recent comment for each item. There are three parts to that.
First: most recent
SELECT MAX(comment_id) FROM comments GROUP BY item_id
Second: most recent comment
SELECT comments.author_id, comments.posted_at
FROM comments
WHERE comments.comment_id IN
(SELECT MAX(comment_id) FROM comments GROUP BY item_id)
Third. Most recent comment for each item.
SELECT items.item_id, items.title, comments.author_id, comments.posted_at
FROM items
LEFT JOIN comments
ON items.item_id = comments.item_id
AND comments.comment_id IN
(SELECT MAX(comment_id) FROM comments GROUP BY item_id)
The trick here is to find the single most recent comment for each item, and then use it. The left join operation preserves those items that have no comments. This query uses comment_id as a proxy to search for the latest posted_at. It assumes comment_id is an autoincrement column, and that later comments have higher comment_id values than earlier comments.
A compound index on the comments table on (item_id, comment_id) will help performance here, by accelerating the GROUP BY subquery.
You can try using max(posted_at) group by item_id and join it with you comments table on these 2 columns.
What about;
Select title,author_id,MAX(posted_at) as LastTime From items i
join comments c
on c.item_id = i.item_id
group by title,author_id
This should give you what you are looking for if the same post exists multiple times. If not, you could even remove the MAX and add other columns.
Ok... well I have to put the subquery in a JOIN clause since it selects more than one column and putting it in the SELECT clause does not allow that as it gives me an error of an operand.
Anywho, this is my query:
SELECT
c.id,
c.title,
c.description,
c.icon,
p.id as topic_id,
p.title AS topic_title,
p.date,
p.username
FROM forum_cat c
LEFT JOIN (
SELECT
ft.id,
ft.cat_id,
ft.title,
fp.date,
u.username
FROM forum_topic ft
JOIN forum_post fp ON fp.topic_id = ft.id
JOIN user u ON u.user_id = fp.author_id
WHERE ft.cat_id = c.id
ORDER BY fp.date DESC
LIMIT 1
) p ON p.cat_id = c.id
WHERE c.main_cat = ?
ORDER BY c.list_no
Now the important thing I need here... FOR EACH category, I want to show the latest post and topic title in each category.
However, this select statement is going INSIDE a foreach loop looping around the general categories which is found my main_cat.
So there are 5 main categories with 3-8 subcategories.. this is the subcategory query. BUT FOR EACH subcategory, I need to grab the latest post.. However, it only runs this SELECT query for each main category so it's only select THE LATEST post between all subcategories combined... I want to get the latest post of EACH subcategory, but I rather not run this query for each subcategory... since I want the page load to be fast.
BUT REMEMBER, some subcategories WILL NOT have a latest post since some of them may not even contain a topic yet! So hence the left join.
Does anyone know how to go about doing this?
AND BTW, there is an error it gives me (WHERE ft.cat_id = c.id) in the subquery because c.id is an unknown column. But I'm trying to reference it from the outer query so can someone help me on that issue as well?
Thank you!
All tables:
forum_cat (Subcategories)
-----------------------------------------------
ID, Title, Description, Icon, Main_cat, List_no
forum_topic (Topics in each subcategory)
--------------------------------------------
ID, Author_id, Cat_id, Title, Sticky, Locked
forum_post (Posts in each topic)
--------------------------------------------
ID, Topic_id, Author_id, Body, Date, Hidden'
The main categories are listed in a function. I didn't store them in the database since it was a waste of space since they never change. There are 7 main categories though.
It's hard to tell without seeing DDL of your tables, relevant sample data and desired output.
I could've got your requirements wrong, but try this:
SELECT *
FROM forum_cat c LEFT JOIN
(SELECT t.cat_id,
p.topic_id,
t.title,
p.id,
p.body,
MAX(p.`date`) AS `date`,
p.author_id,
u.username
FROM forum_post p INNER JOIN
forum_topic t ON t.id = p.topic_id INNER JOIN
`user` u ON u.user_id = p.author_id
GROUP BY t.cat_id) d ON d.cat_id = c.id
WHERE c.main_cat = 1
ORDER BY c.list_no
I have an article table which holds the number of articles views for each day. A new record is created to hold the count for each seperate day for each article.
The query below gets the article id and total views for the top 5 viewed article id for all time :
SELECT article_id,
SUM(article_count) as cnt
FROM article_views
GROUP BY article_id
ORDER BY cnt DESC
LIMIT 5
I also have a seperate article table which holds all the article fields. I want to ammend the query above to join to the article table and get two fields for each article id. I have tried to do this below but count is comming back incorrectly :
SELECT article_views.article_id, SUM( article_views.article_count ) AS cnt, articles.article_title, articles.artcile_url
FROM article_views
INNER JOIN articles ON articles.article_id = article_views.article_id
GROUP BY article_views.article_id
ORDER BY cnt DESC
LIMIT 5
Im not sure exactly what im doing wrong. Do I need to do a subquery?
Add articles.article_title, articles.artcile_url to the GROUP BY clause:
SELECT
article_views.article_id,
articles.article_title,
articles.artcile_url,
SUM( article_views.article_count ) AS cnt
FROM article_views
INNER JOIN articles ON articles.article_id = article_views.article_id
GROUP BY article_views.article_id,
articles.article_title,
articles.artcile_url
ORDER BY cnt DESC
LIMIT 5;
The reason you were not getting correct result set, is that when you select rows that are not included in the GROUP BY nor in an aggregate function in the SELECT clause MySQL picks up random value.
You are using a MySQL (mis) feature called Hidden Columns, because article title is not in the group by. However, this may or may not be causing your problem.
If the counts are wrong, then I think you have duplicate article_id in the article table. You can check this by doing:
select article_id, count(*) as cnt
from articles
group by article_id
having cnt > 1
If any appear, then that is your problem. If they all have different titles, then grouping by the title (as suggested by Mahmoud) would fix the problem.
If not, one way to fix it is the following:
SELECT article_views.article_id, SUM( article_views.article_count ) AS cnt, articles.article_title, articles.artcile_url
FROM article_views INNER JOIN
(select a.* from articles group by article_id) articles
ON articles.article_id = article_views.article_id
GROUP BY article_views.article_id
ORDER BY cnt DESC
LIMIT 5
This chooses an abitrary title for the article.
Your query looks basically right to me...
But the value returned for cnt is going to be dependent upon article_id column being UNIQUE in the articles table. We'd assume that it's the primary key, and absent a schema definition, that's only an assumption.)
Also, we're likely to assume there's a foreign key between the tables, that is, there are no values of article_id in the articles_view table which don't match a value of article_id on a row from the articles table.
To check for "orphan" article_id values, run a query like:
SELECT v.article_id
FROM articles_view v
LEFT
JOIN articles a
ON a.article_id = v.article_id
WHERE a.article_id IS NULL
To check for "duplicate" article_id values in articles, run a query like:
SELECT a.article_id
FROM articles a
GROUP BY a.article_id
HAVING COUNT(1) > 1
If either of those queries returns rows, that could be an explanation for the behavior you observe.