Advise for best query performance (MySQL) - mysql

I'm looking for the best MySQL query for that situation:
I'm listing 10 last posts of a member.
table for posts:
post_id | uid | title | content | date
The member have the possibility to subscribe to other member posts, so that posts are listed in the same list (sorted by date - same table)
So it's ok to select last posts of userid X and userid Y
But I'd like to allow members to diable display of some posts (the ones he doesn't want to be displayed).
My problem is: how can I make that as simple as possible for MySQL?... I thought about a second table where I put the post ids that the user doesn't want:
table postdenied
uid | post_id
Then make a select like:
select * from posts as p where not exists (select 1 from postdenied as d where d.post_id = p.post_id and d.uid = p.uid) order by date DESC limit 10
I'm right?
Or is there something better?
Thanks

If I understand correctly, the posts.uid column stores the ID of the poster. And the postdenied.uid stores the ID of the user that doesn't want to see a certain post.
If the above assumptions are correct, then your query is fine, except that you should not join on the uid columns, only on the post_id ones. And you should have a parameter or constant the userID (noted as #X in the code below) of the user that you want to show all the posts - except those he has "denied":
select p.*
from posts as p
where not exists
(select 1
from postdenied as d
where d.post_id = p.post_id
and d.uid = #X -- #X is the userID of the specific user
)
order by date DESC
limit 10 ;

Another approach to implementing this would be with a LEFT JOIN clause.
SELECT * FROM posts AS p
LEFT JOIN postdenied as d ON d.post_id = p.post_id and d.uid = p.uid
WHERE d.uid IS NULL
ORDER BY date DESC
LIMIT 10
It's unclear to me whether this would be more amenable to the query optimizer. If you have a large amount of data, it may be worth testing both queries and seeing if one is more performant than the other.
See http://sqlfiddle.com/#!2/be7e3/1
Appreciation to ypercube and Lamak for their feedback on my original answer

Related

Helpful answer as the second post - sort order asc

A forum has topics and in this topics are the posts. The sort order is from old to new. It is possible to rate each post with "helpful".
A default SQL selection looks like this:
SELECT * FROM `posts` WHERE `topic_id` = 5033 ORDER BY `post_id` ASC
The "helpful" field in the posts table has the name "post_helpful".
Is it possible to order the posts in this way:
First post - Contains the question
If a post with more then 3 "post_helpful" exists, display this post as the second post. But only the post with the highest score.
Normal post row without the second post id
I only want the post with the highest score on the second post position. But only if the post has more than 3 rates. If there is no post with more than 3 rates, keep the default order
Thank you
Yes. The first part is a little tricky. You can use multiple expressions in the ORDDER BY:
SELECT p.*
FROM posts p CROSS JOIN
(SELECT MIN(p.post_id) as min_post_id
FROM posts p
WHERE p.topic_id = 5033
) pp
WHERE p.topic_id = 5033
ORDER BY (p.post_id = pp.min_post_id) DESC, -- lowest id first
(case when p.post_helpful > 3 then p.post_helpful else 0 end) DESC, -- helpful next
p.post_id ASC;
EDIT:
To get the posts with the maximum helpful:
SELECT p.*
FROM posts p CROSS JOIN
(SELECT MIN(p.post_id) as min_post_id,
MAX(p.post_helpful) as max_post_helpful
FROM posts p
WHERE p.topic_id = 5033
) pp
WHERE p.topic_id. = 5033
ORDER BY (p.post_id = pp.min_post_id) DESC, -- lowest id first
(pp.max_post_helpful > 3 AND p.post_helpful = pp.max_post_helpful) DESC, -- helpful next
p.post_id ASC;
I believe you want to have more complex sorting for the query
Can you assign a value or weight by the count of rows that were flagged helpful
This would then have a number to sort by
I.e. Query 2 counts the number of rows that show the post was helpful
Query 1 is the main query ordered by the value/count from Query 2

Avoiding the “n+1 selects” problem when I want a subset of related resources

Imagine I'm designing a multi-user blog and I have user, post, and comment tables with the obvious meanings. On the main page, I want to show the ten most recent posts along with all their related comments.
The naive approach would be to SELECT the ten most recent posts, probably JOINed with the users that wrote them. And then I can loop through them to SELECT the comments, again, probably JOINed with the users that wrote them. This would require 11 selects: 1 for the posts and 10 for their comments, hence the name of the famous anti-pattern: n+1 selects.
The usual advice for avoiding this anti-pattern is to use the IDs from the first query to fetch all related comments in a second query which may look something like this:
SELECT
*
FROM
comments
WHERE
post_id IN (/* A comma separated list of post IDs returned from the first query */)
As long as that comma separated list is in reasonably short we managed to fetch all the data we need by issuing only two SELECT queries instead of eleven. Great.
But what if I only want the top three comments for each post? I didn't try but I can probably come up with some LEFT JOIN trickery to fetch the most recent posts along with their top three comments in a single query but I'm not sure it would be scalable. What if I want the top hundred comments which would exceed the join limit of 61 tables of a typical MySQL installation for instance?
What is the usual solution for this other than reverting to n+1 selects anti-pattern? What is the most efficient way to fetch items with a subset of items related to each one in this fairly typical scenario?
It is usually a better option to run as few queries as possible, and then implement some application logic on top of it if needed. In your use case, I would build a query that returns both the most recent posts and the most recent associated comments, with proper ordering to make the application processing easier. Then your application can take care of displaying them.
Assuming that you use MySQL (since you mentionned it in your question), let's start with a query that gives you the 10 most recent posts:
SELECT * FROM posts ORDER BY post_date DESC LIMIT 10
Then you can join this with the corresponding comments:
SELECT
p.*,
c.*
FROM
(SELECT * FROM posts ORDER BY post_date DESC LIMIT 10) p
INNER JOIN comments c ON c.post_id = p.id
Finally, let's set up a limit on the number of comments per posts. For this, you can use ROW_NUMBER() (available in MySQL 8.0) to rank the comments per post, and then filter only the a given number of comments. This gives you the 10 most recent posts along with each of their 3 most recents comments:
SELECT *
FROM (
SELECT
p.*,
c.*,
ROW_NUMBER() OVER(PARTITION BY p.post_id ORDER BY c.comment_date DESC) rn
FROM
(SELECT * FROM posts ORDER BY post_date DESC LIMIT 10) p
INNER JOIN comments c ON c.post_id = p.id
) x
WHERE rn <= 3
ORDER BY p.post_date DESC, c.comment_date DESC
Query results are ordered by post, then by comment date. So when your application fetches the resuts, you get 1 to 3 records per post, in sequence.
If you want the last 10 posts
SELECT p.post_id
FROM post p
ORDER BY p.publish_date DESC
LIMIT 10
Now if you want the comment of those posts:
SELECT c.comment_id, u.name
FROM comments c
JOIN users u
on c.user_id = u.user_id
WHERE c.post_id IN ( SELECT p.post_id
FROM post p
ORDER BY p.publish_date DESC
LIMIT 10 )
Now for the last 3 comments is where rdbms version is important so you can use row_number or not:
SELECT *
FROM (
SELECT c.comment_id, u.name,
row_number() over (partition by c.post_id order by c.comment_date DESC) as rn
FROM comments c
JOIN users u
on c.user_id = u.user_id
WHERE c.post_id IN ( SELECT p.post_id
FROM post p
ORDER BY p.publish_date DESC
LIMIT 10 )
) x
WHERE x.rn <= 3
You can do this in one query:
select . . . -- whatever columns you want here
from (select p.*
from posts p
order by <datecol> desc
fetch first 10 rows only
) p join
users u
on p.user_id = u.user_id join
comments c
on c.post_id = p.post_id;
This returns the posts/users/comments in one table, mixing the columns. But it only requires one query.

MySQL: combination of LEFT JOIN and ORDER BY is slow

There are two tables: posts (~5,000,000 rows) and relations (~8,000 rows).
posts columns:
-------------------------------------------------
| id | source_id | content | date (int) |
-------------------------------------------------
relations columns:
---------------------------
| source_id | user_id |
---------------------------
I wrote a MySQL query for getting 10 most recent rows from posts which are related to a specific user:
SELECT p.id, p.content
FROM posts AS p
LEFT JOIN relations AS r
ON r.source_id = p.source_id
WHERE r.user_id = 1
ORDER BY p.date DESC
LIMIT 10
However, it takes ~30 seconds to execute it.
I already have indexes at relations for (source_id, user_id), (user_id) and for (source_id), (date), (date, source_id) at posts.
EXPLAIN results:
How can I optimize the query?
Your WHERE clause renders your outer join a mere inner join (because in an outer-joined pseudo record user_id will always be null, never 1).
If you really want this to be an outer join then it is completely superfluous, because every record in posts either has or has not a match in relations of course. Your query would then be
select id, content
from posts
order by "date" desc limit 10;
If you don't want this to be an outer join really, but want a match in relations, then we are talking about existence in a table, an EXISTS or IN clause hence:
select id, content
from posts
where source_id in
(
select source_id
from relations
where user_id = 1
)
order by "date" desc
limit 10;
There should be an index on relations(user_id, source_id) - in this order, so we can select user_id 1 first and get an array of all desired source_id which we then look up.
Of course you also need an index on posts(source_id) which you probably have already, as source_id is an ID. You can even speed things up with a composite index posts(source_id, date, id, content), so the table itself doesn't have to be read anymore - all the information needed is in the index already.
UPDATE: Here is the related EXISTS query:
select id, content
from posts p
where exists
(
select *
from relations r
where r.user_id = 1
and r.source_id = p.source_id
)
order by "date" desc
limit 10;
You could put an index on the date column of the posts table, I believe that will help the order-by speed.
You could also try reducing the number of results before ordering with some additional where statements. For example if you know the that there will likely be ten records with the correct user_id today, you could limit the date to just today (or N days back depending on your actual data).
Try This
SELECT p.id, p.content FROM posts AS p
WHERE p.source_id IN (SELECT source_id FROM relations WHERE user_id = 1)
ORDER BY p.date DESC
LIMIT 10
I'd consider the following :-
Firstly, you only want the 10 most recent rows from posts which are related to a user. So, an INNER JOIN should do just fine.
SELECT p.id, p.content
FROM posts AS p
JOIN relations AS r
ON r.source_id = p.source_id
WHERE r.user_id = 1
ORDER BY p.date DESC
LIMIT 10
The LEFT JOIN is needed if you want to fetch the records which do not have a relations mapping. Hence, doing the LEFT JOIN results in a full table scan of the left table, which as per your info, contains ~5,000,000 rows. This could be the root cause of your query.
For further optimisation, consider moving the WHERE clause into the ON clause.
SELECT p.id, p.content
FROM posts AS p
JOIN relations AS r
ON (r.source_id = p.source_id AND r.user_id = 1)
ORDER BY p.date DESC
LIMIT 10
I would try with a composite index on relations :
INDEX source_user (user_id,source_id)
and change the query to this :
SELECT p.id, p.content
FROM posts AS p
INNER JOIN relations AS r
ON ( r.user_id = 1 AND r.source_id = p.source_id )
ORDER BY p.date DESC
LIMIT 10

MySQL - COUNT and retrieve n rows from a subquery

Context:
I have an app that shows posts and comments on the home page.
My intention is to limit the number of posts shown (ie, 10 posts) and...
Limit the number of comments shown per post (ie, 2 comments).
Show the total number of comments in the front end (ie, "read all 10 comments")
MySQL:
(SELECT *
FROM (SELECT *
FROM post
ORDER BY post_timestamp DESC
LIMIT 0, 10) AS p
JOIN user_profiles
ON user_id = p.post_author_id
LEFT JOIN (SELECT *
FROM data
JOIN pts
ON pts_id = pts_id_fk) AS d
ON d.data_id = p.data_id_fk
LEFT JOIN (SELECT *
FROM comment
JOIN user_profiles
ON user_id = comment_author_id
ORDER BY comment_id ASC) AS c
ON p.post_id = c.post_id_fk))
I've failed to insert LIMIT and COUNT in this code to get what I want - any suggestions? - will be glad to post more info if needed.
If I'm understanding you correctly you want no more than 10 posts (and 2 comments) to come back for each unique user in the returned result set.
This is very easy in SQLServer / Oracle / Postgre using a "row_number() PARTITION BY".
Unfortunately there is no such function in MySql. Similar question has been asked here:
ROW_NUMBER() in MySQL
I'm sorry I can't offer a more specific solution for MySql. Definitely further research "row number partition by" equivalents for MySql.
The essence of what this does:
You can add a set of columns that make up a unique set, say user id for example sake (this is the "partition") A "row number" column is then added to each row that matches the partition and starts over when it changes.
This should illustrate:
user_id row_number
1 1
1 2
1 3
2 1
2 2
You can then add an outer query that says: select where row_number <= 10, which can be used in your case to limit to no more than 10 posts. Using the max row_number for that user to determine the "read all 10 comments" part.
Good luck!
This is the skeleton of the query you're looking for:
select * from (
select p1.id from posts p1
join posts p2 on p1.id <= p2.id
group by p1.id
having count(*) <= 3
order by p1.post_timestamp desc
) p left join (
select c1.id, c2.post_id from comments c1
join comments c2 on c1.id <= c2.id and c1.post_id = c2.post_id
group by c1.id
having count(*) <= 2
order by c1.comment_timestamp desc
) c
on p.id = c.post_id
It will get posts ordered by their descending timestamp but only the top 3 of them. That result will be joined with the top 2 comments of each post order by their descending timestamp. Just change the column names and it will work :)

MySQL Query -- based on user and business ID's

I am working on a project right now and am rather stumped with a specific sql query I (need) to execute. Let me start off by showing the DB structure I need to pull from.
--posts_table--
ID
post_title
post_text
bus_id
This next table is what is screwing with me. The only way data related to the logged in user is in here is if they have "liked" a specific post -- otherwise there is no data related to that user in this table. Now there could be plenty of data related to a particular post, just generated from other users.
--likes_table--
ID
user_id
post_id
like
What I need this to do is grab all the posts from the post_table above where a specific business id is specified. From there, I need it to grab the "like" column in the likes_table if there is data in there related to the logged in user. If there is no data there, just leave that field null in the query. Below is a query I wrote that works until there is other "like" data in the like_table from other users.
SELECT posts.id, posts.post_text, posts.post_title, likes.post_id, likes.like
FROM posts LEFT JOIN likes ON posts.id = likes.post_id WHERE
posts.bus_id = 1 AND likes.user_id IS NULL OR likes.user_id = 1;
This works up until data has been entered in the table about a specific post being liked by a different user before that user has done anything with that post, whether they like or dislike it. I am not sure if this specific type of query is even possible, any help would be much appreciated.
Edit:
After looking at it again -- I got it, finally. I just needed to add one more AND. Below is the proper query I was looking for.
SELECT posts.id, posts.post_text, posts.post_title, likes.post_id, likes.like
FROM posts LEFT JOIN likes ON posts.id = likes.post_id AND posts.user_id = 1 WHERE
posts.bus_id = 1 AND likes.user_id IS NULL OR likes.user_id = 1;
Ahh, I think I get you -- is it that if a particular post hasn't been commented on by user_id number 1 at all, the row for that doesn't show up at all?
In that case, put your l.user_id=1 into the JOIN condition instead of the WHERE condition --- this will put a NULL in if user_id 1 hasn't liked or disliked a particular post.
SELECT p.id, p.post_text, p.post_title, l.post_id, l.likes
FROM posts p
LEFT JOIN likes l ON p.id = l.post_id AND l.user_id=1
WHERE p.bus_id = 1
The l.user_id IS NULL OR l.user_id=1 has been incorporated into the LEFT JOIN -- it doesn't make rows for the other user_ids.