MySQL Nested Selected with Multiple Columns - mysql

I am currently trying to retrieve the latest posts along with their related posts (x number for each post). I have the following query in hand:
SELECT id, title, content
(SELECT GROUP_CONCAT(title) FROM posts -- Select title of related posts
WHERE id <> p.id AND id IN (
SELECT p_id FROM tagsmap -- Select reletad post ids from tagsmap
WHERE t_id IN (
SELECT t_id FROM tagsmap -- Select the tags of the current post
WHERE p_id = p.id)
) ORDER BY id DESC LIMIT 0, 3) as related
FROM posts as p ORDER BY id DESC LIMIT 5
My database structure is simple: A posts table. A tags table. And a tagsmap table where I associate posts with tags.
This query works fine (though I don't know its performance since I don't have many rows in the tables -- Maybe an explain could help me but that's not the case right now).
What I really need is to retrieve the ids of the related posts along with their titles.
So I'd like to do SELECT GROUP_CONCAT(title), GROUP_CONCAT(id), but I know that will result in an error. So what is the best way to retrieve the id along with the title in this case? I do not want to rewrite the whole subquery to just retrieve the id. There should be another way.
EDIT
SELECT p1.id, p1.title, p1.content,
group_concat(DISTINCT p2.id) as 'P IDs',
group_concat(DISTINCT p2.title) as 'P titles'
FROM posts as p1
LEFT JOIN tagsmap as tm1 on tm1.p_id = p1.id
LEFT JOIN tagsmap as tm2 on tm2.t_id = tm1.t_id and tm1.p_id <> tm2.p_id
LEFT JOIN posts as p2 on p2.id = tm2.p_id
GROUP BY p1.id
ORDER BY p1.id desc limit 5;
At the end this is the query that I've used. I removed the Where clause because it is unnecessary and used LEFT JOIN rather that JOIN because otherwise it would ignore the posts without tags. And finally added DISTINCT to group_concat because it was concatenating duplicate rows (If for example a post had multiple common tags with a related post it would result in a duplicate concatenation).
The query above works perfectly. Thanks for all.

Okay - this will work, and it has the added advantage of eliminating the sub queries (which can slow you down when you get lots of records):
SELECT p1.id, p1.title, p1.content,
group_concat( p2.id) as 'P IDs',
group_concat( p2.title) as 'P titles'
FROM posts as p1
JOIN tagsmap as tm1 on tm1.p_id = p1.id
JOIN tagsmap as tm2 on tm2.t_id = tm1.t_id and tm1.p_id <> tm2.p_id
JOIN posts as p2 on p2.id = tm2.p_id
WHERE p2.id <> p1.id
GROUP BY p1.id
ORDER BY p1.id desc limit 5;
What we're doing here is selecting what you want from the first version of posts, joining them to the tagsmap by their post.id, doing a self join to tagsmap by tag id to get all the related tags, and then joining back to another posts (p2) to get the posts that are pointed to by those related tags.
Use GROUP BY to discard the dups from all that joining, and you're there.

like this?
SELECT id, title, content
(SELECT GROUP_CONCAT(concat(cast(id as varchar(10)), ':', title)) FROM posts -- Select title of related posts
WHERE id <> p.id AND id IN (
SELECT p_id FROM tagsmap -- Select reletad post ids from tagsmap
WHERE t_id IN (
SELECT t_id FROM tagsmap -- Select the tags of the current post
WHERE post_id = p.id)
) ORDER BY id DESC LIMIT 0, 3) as related
FROM posts as p ORDER BY id DESC LIMIT 5

Related

MySQL - Merging two queries that have different conditions and limits

I'm implementing a Tag System for my website, using PHP + MySQL.
In my database, I have three tables:
Posts
Id
Title
DateTime
Primary Key: Id
Tags
Id
Tag
Slug
1
First Tag
first-tag
Primary Key: Id | Key: Slug
TagsMap
Id
Tag
Primary Key: both
(Id = post's Id in Posts; Tag = Tag's Id in Tags)
Given, for instance, the url www. ... .net/tag/first-tag, I need to show:
the tag's name (in this case: "First Tag");
the last 30 published posts having that tag.
In order to achieve this, I'm using two different queries:
firstly
SELECT Tag FROM Tags WHERE Slug = ? LIMIT 1
then
SELECT p.Title FROM Posts p, Tags t, TagsMap tm
WHERE p.Id = tm.Id
AND p.DateTime <= NOW()
AND t.Id = tm.Tag
AND t.Slug = ?
ORDER BY p.Id DESC
LIMIT 30
But I don't think it's a good solution in terms of performance (please, correct me if I'm wrong).
So, my question is: how (if possible) to merge those two queries into just one?
Thanks in advance for Your suggestions.
The query that you have shown above is not a optimal solution as first it creates a cartesian product of all the tables and then filters out the data based on the conditions. If these tables become heavier in future, then your query will start slowing down (SLOW QUERIES).
Please use joins over this approach. ex. INNER JOIN, LEFT JOIN, RIGHT JOIN etc.
Try this SQL:
SELECT t.*, p.* FROM Tags t
INNER JOIN TagsMap tm ON (tm.Tag = t.Id )
INNER JOIN Posts p ON (p.Id = tm.Id AND p.DateTime <= NOW())
WHERE t.slug LIKE 'First Tag'
ORDER BY p.Id DESC
LIMIT 30
Given that you have structured your tables in a manner where you can utilize foreign keys and match them with their counterparts, then you can make use of JOIN's in your query.
SELECT
Tags.Tag,
Posts.title
FROM
Tags
LEFT JOIN
TagsMap ON Tags.id = TagsMap.tag
LEFT JOIN
Posts ON TagsMap.id = Posts.id AND
Posts.DateTime <= NOW()
WHERE
Posts.id = TagsMap.id AND
Tags.Slug = ?
ORDER BY
Posts.id DESC
LIMIT 30
The idea is that the query is optimized, but you will need to filter your result set programmatically in the view, in order to display the Tag only once.
If there is at most one "slug" per "post", include slug as a column in Posts.
If there can be any number of "tags" per "post", then have a table
CREATE Tags (
post_id ... NOT NULL,
tag VARCHAR(..)... NOT NULL,
post_dt DATETIME NOT NULL,
PRIMARY KEY(post_id),
INDEX(tag, dt)
) ENGINE=InnoDB
And you may want to use LEFT JOIN Tags and GROUP_CONCAT(tag).
I don't know what you mean by "first" in "first_tag". Maybe you should get rid of "first"?
The last 30 posts for a given tag:
SELECT p.*,
( SELECT GROUP_CONCAT(tag) FROM Tags ) AS tags
FROM ( SELECT post_id FROM tags WHERE tag = ?
ORDER BY post_dt DESC LIMIT 30 ) AS x
JOIN posts AS p ON p.id = x.post_id

Can you use Count(*) in an inner join?

Currently I have the following database:
Table 1: Customer_Stores
unique_id
page_address
date_added
guide_summary
user_name
cover_photo
guide_title
Table 2: Customer_Stories_Likes
story_id
likex
The 'like' column in the second table contains a 1 or a 0 to indict whether or not a user has liked a post.
What I'd like to do is join these two tables together with 'post_id' and count all of the 'likes' for all the posts based on post_id and order these by how many likes each post got. Is this possible with a single statement? or is it better to use a Count(*) to first determine how many likes each post has?
Yes, it's possible, but you don't need an inner join, because you don't actually need the posts table to do it.
SELECT post_id, count(like) AS post_likes
FROM likes
WHERE like = 1
GROUP BY post_id
ORDER BY post_likes DESC
If you need other information from the posts table as well, you could join it to a subquery that gets the like counts.
SELECT posts.*, like_count
FROM
posts LEFT JOIN
(SELECT post_id, count(like) AS like_count
FROM likes
WHERE like = 1
GROUP BY post_id) AS post_likes
ON posts.post_id = post_likes.post_id
ORDER BY like_count DESC
I used LEFT JOIN rather than INNER JOIN, you can use INNER JOIN if you don't want to include posts with no likes.

Nested query performance

I have two queries below. The first one has a nested select. The second one makes use of a group by clause.
select
posts.*,
(select count(*) from comments where comments.post_id = posts.id and comments.is_approved = 1) as comments_count
from
posts
select
posts.*,
count(comments.id) comments_count
from
posts
left join comments on
comments.post_id = posts.id
group by
posts.*
From my understanding the first query is worse because it has to do a select for each record in posts where as the second query does not.
Is this true or false?
As with all performance questions, you should test the performance on your system with your data.
However, I would expect the first to perform better, with the right indexes. The right index for:
select p.*,
(select count(*)
from comments c
where c.post_id = p.id and c.is_approved = 1
) as comments_count
from posts p
is comments(post_id, is_approved).
MySQL implements a group by by doing a file sort. This version saves a file sort on all the data. My guess is that will be faster than the second method.
As a note: group by posts.* is not valid syntax. I assume this was intended for illustration purposes only.
This is the standard way I would do it (the use of LEFT JOIN, and SUM lets you also know which posts have no comments.)
SELECT posts.*
, SUM(IF(comments.id IS NULL, 0, 1)) AS comments_count
FROM posts
LEFT JOIN comments USING (post_id)
GROUP BY posts.post_id
;
But if I were trying for faster, this might be better.
SELECT posts.*, IFNULL(subQ.comments_count, 0) AS comments_count
FROM posts
LEFT JOIN (
SELECT post_id, COUNT(1) AS comments_count
FROM comments
GROUP BY post_id
) As subQ
USING (post_id)
;
After a bit more research I found no time difference between the two queries
Benchmark.bm do |b|
b.report('joined') do
1000.times do
ActiveRecord::Base.connection.execute('
select
p.id,
(select count(c.id) from comments c where c.post_id = p.id) comment_count
from
posts l;')
end
end
b.report('nested') do
1000.times do
ActiveRecord::Base.connection.execute('
select
p.id,
count(c.id) comment_count
from
posts File.join(File.dirname(__FILE__), *%w[rel path here])
left join comments c on
c.post_id = p.id
group by
p.id;')
end
end
end
user system total real
nested 2.120000 0.900000 3.020000 ( 3.349015)
joined 2.110000 0.990000 3.100000 ( 3.402986)
However I did notice that when running an explain for both queries, more indexes are possible in the first query. Which makes me think it is a better option if the attributes needed in the select changed.

MYSQL subquery SELECT in JOIN clause

Ok... well I have to put the subquery in a JOIN clause since it selects more than one column and putting it in the SELECT clause does not allow that as it gives me an error of an operand.
Anywho, this is my query:
SELECT
c.id,
c.title,
c.description,
c.icon,
p.id as topic_id,
p.title AS topic_title,
p.date,
p.username
FROM forum_cat c
LEFT JOIN (
SELECT
ft.id,
ft.cat_id,
ft.title,
fp.date,
u.username
FROM forum_topic ft
JOIN forum_post fp ON fp.topic_id = ft.id
JOIN user u ON u.user_id = fp.author_id
WHERE ft.cat_id = c.id
ORDER BY fp.date DESC
LIMIT 1
) p ON p.cat_id = c.id
WHERE c.main_cat = ?
ORDER BY c.list_no
Now the important thing I need here... FOR EACH category, I want to show the latest post and topic title in each category.
However, this select statement is going INSIDE a foreach loop looping around the general categories which is found my main_cat.
So there are 5 main categories with 3-8 subcategories.. this is the subcategory query. BUT FOR EACH subcategory, I need to grab the latest post.. However, it only runs this SELECT query for each main category so it's only select THE LATEST post between all subcategories combined... I want to get the latest post of EACH subcategory, but I rather not run this query for each subcategory... since I want the page load to be fast.
BUT REMEMBER, some subcategories WILL NOT have a latest post since some of them may not even contain a topic yet! So hence the left join.
Does anyone know how to go about doing this?
AND BTW, there is an error it gives me (WHERE ft.cat_id = c.id) in the subquery because c.id is an unknown column. But I'm trying to reference it from the outer query so can someone help me on that issue as well?
Thank you!
All tables:
forum_cat (Subcategories)
-----------------------------------------------
ID, Title, Description, Icon, Main_cat, List_no
forum_topic (Topics in each subcategory)
--------------------------------------------
ID, Author_id, Cat_id, Title, Sticky, Locked
forum_post (Posts in each topic)
--------------------------------------------
ID, Topic_id, Author_id, Body, Date, Hidden'
The main categories are listed in a function. I didn't store them in the database since it was a waste of space since they never change. There are 7 main categories though.
It's hard to tell without seeing DDL of your tables, relevant sample data and desired output.
I could've got your requirements wrong, but try this:
SELECT *
FROM forum_cat c LEFT JOIN
(SELECT t.cat_id,
p.topic_id,
t.title,
p.id,
p.body,
MAX(p.`date`) AS `date`,
p.author_id,
u.username
FROM forum_post p INNER JOIN
forum_topic t ON t.id = p.topic_id INNER JOIN
`user` u ON u.user_id = p.author_id
GROUP BY t.cat_id) d ON d.cat_id = c.id
WHERE c.main_cat = 1
ORDER BY c.list_no

Get the latest row from another table in MySQL

Let's say I have two tables, news and comments.
news (
id,
subject,
body,
posted
)
comments (
id,
parent, // points to news.id
message,
name,
posted
)
I would like to create one query that grabs the latest x # of news item along with the name and posted date for the latest comment for each news post.
Speed matters in terms of selecting ALL the comments in a subquery is not an option.
I just realized the query does not return results if there are no comments attached to the news table, here's the fix as well as an added column for the total # of posts:
SELECT news.*, comments.name, comments.posted, (SELECT count(id) FROM comments WHERE comments.parent = news.id) AS numComments
FROM news
LEFT JOIN comments
ON news.id = comments.parent
AND comments.id = (SELECT max(id) FROM comments WHERE parent = news.id)
If speed is that important, why not create a recent_comment table that contains the id and parent id of just the most recent comments? Every time a comment is posted on a news post, replace that news id's most recent comment id. Create an index on the news id column of the new table and your joins will be fast.You'd be trading write speed for read speed, but not by a whole lot.
Assuming posted is a unique timestamp, otherwise choose a unique autonumber
select c.id, c.parent, c.message, c.name, c.posted
c.message, c.name,
c.posted -- same as comment_latest.recent
from comments c
join
(
select parent, max(posted) as recent
from comments
group by parent
) as comment_latest
on c.parent = comment_latest.parent
and c.posted = comment_latest.recent
Complete(displays news information):
select
n.id as news_id, n.subject, n.body, n.posted as news_posted_date
c.id as comment_id,
c.message, c.name as commenter_name, c.posted as comment_posted_date
from comments c
join
(
select r.parent, max(r.posted) as recent
from comments r
join
(
select id from news order by id desc limit $last_x_news
) news l
on r.parent = l.id
group by r.parent
) as comment_latest
on c.parent = comment_latest.parent
and c.posted = comment_latest.recent
join news n on c.parent = n.id
NOTE:
The above code is not subquery, it is table-deriving query. It is faster than subquery. This is subquery(slow):
select
id,
subject,
body,
posted as news_posted_date,
(select id from comments where parent = news.id order by posted desc limit 1) as comment_id,
(select message from comments where parent = news.id order by posted desc limit 1) as message,
(select name from comments where parent = news.id order by posted desc limit 1) as name,
(select posted from comments where parent = news.id order by posted desc limit 1) as comment_posted_date,
from news
SELECT news.subject, news.body, comments.name, comments.posted
FROM news
INNER JOIN comments ON
(comments.parent = news.id)
WHERE comments.parent = news.id
AND comments.id = (SELECT MAX(id)
FROM comments
WHERE parent = news.id)
ORDER BY news.id
This gets all the news items, along with the related comment with the highest id value, which in theory should be the latest.
My solution is similar to J but I think he added one line that is unnecessary:
SELECT news.*, comments.name, comments.posted FROM news INNER JOIN comments ON news.id = comments.parent WHERE comments.id = (SELECT max(id) FROM comments WHERE parent = news.id )
Not sure of the speed on an extremely large table though.
Given the constraints brought to light in the comments of my other answer, I have a new idea that may or may not make any sense in practise.
Create a view (or function if it's more appropriate) with the following definition, called recent_comments:
SELECT MAX(id), parent
FROM comments
GROUP BY parent
If you have a clustered index on the parent column, this is probably a reasonably fast query, but even then it will still be a bottleneck.
Using this, the query you need to get your answer is something like,
SELECT news.*, comments.*
FROM news
INNER JOIN recent_comments
ON news.id = recent_comments.parent
INNER JOIN comments
ON comments.id = recent_comments.id
Plus considerations for news posts that don't have any comments yet.
I think the solution provided by #Jan is the best. i.e create the "View" and inner join it with the SQL statement.
It'll definitely reduce the time to pull the data. I tested it and it works 100%.