MySQL JOIN + Subquery Query Optimization - mysql

I'm trying to fetch 100 posts and order them by the number of times they've been "remixed" in the last week. Here is my query thus far:
SELECT COUNT(remixes.post_id) AS count, posts.title
FROM posts
LEFT JOIN (
SELECT * FROM remixes WHERE created_at >= 1343053513
) AS remixes ON posts.id = remixes.post_id
GROUP BY posts.id
ORDER BY count DESC, posts.created_at DESC
LIMIT 100
This produces the correct result; however, after running DESCRIBE I get this:
And here are my indexes on posts:
And my indexes on remixes:
And here are my questions:
Can you explain what the terms used in the extra column are really trying to tell me?
Could you provide tips on how I can optimize this query so that it'll scale better.
Thanks in advance!
Update
Per Zane's solution, I've updated my query to:
SELECT COUNT(remixes.post_id) AS count, posts.title
FROM posts
LEFT JOIN remixes ON posts.id = remixes.post_id AND remixes.created_at >= 1343053513
GROUP BY posts.id
ORDER BY count DESC, posts.created_at DESC
LIMIT 100
And here's the latest DESCRIBE
I'm still worried about the filesort part. Any ideas?

Try not to wrap your JOIN in a sub-select as this will create an unindexed temporary table to store the result of the subselect in, where it then joins on that unindexed table.
Instead, put created_at as an additional join condition when joining the remixes table:
SELECT
a.title, COUNT(b.post_id) AS remixcnt
FROM
posts a
LEFT JOIN
remixes b ON a.id = b.post_id AND b.created_at >= 1343053513
GROUP BY
a.id, a.title
ORDER BY
remixcnt DESC, a.created_at DESC
LIMIT 100

It seems to me that
SELECT COUNT(remixes.post_id) AS count, posts.title
FROM posts
LEFT JOIN (
SELECT * FROM remixes WHERE created_at >= 1343053513
) AS remixes ON posts.id = remixes.post_id
GROUP BY posts.id
ORDER BY count DESC, posts.created_at DESC
LIMIT 100
could be rewritten as
SELECT COUNT(r.post_id) AS count, posts.title
FROM posts
LEFT JOIN remixes r ON posts.id = r.post_id
WHERE r.created_at >= 1343053513
GROUP BY posts.id
ORDER BY count DESC, posts.created_at DESC
LIMIT 100
which should give you a better EXPLAIN plan and run faster.

Related

JOIN two tables, sort by field on second table, no duplicates

My apologies but I cannot get my head around this one (even not after searching and trying out a few things). All I want to do is join two tables and then sort the join descending on the created_at in the article_translations table. However, I need unique entries.
I have two tables:
articles
--------
id
user_id
article_translations
--------
id
article_id (brings this table together with the other one)
locale
title
...
created_at
updated_at
Performing mysql query:
SELECT * from articles
JOIN article_translations as t on t.article_id = articles.id
ORDER BY t.created_at desc
I get the joined tables with the corresponding related entries.
articles.id t.article_id created_at
1 1 ''
1 1 ''
2 2 ''
When I try no to get rid of the duplicates, in this case of the article with id = 1, I get a nasty error:
Expression #3 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'blog.t.id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
The Desired result would be:
articles.id t.article_id created_at
1 1 ''
2 2 ''
Any help please... Thank You!
The only way to get unique rows is if you want the latest (or the earliest?) date for each id, which you can do if you group by a.id, t.article_id and aggregate:
SELECT a.id, t.article_id, MAX(t.created_at) AS created_at
FROM articles AS a INNER JOIN article_translations AS t
ON t.article_id = a.id
GROUP BY a.id, t.article_id
ORDER BY MAX(t.created_at) DESC
If you want all the columns of the 2 tables, first get unique rows from article_translations with NOT EXISTS and then join to articles:
SELECT *
FROM articles AS a INNER JOIN (
SELECT t.*
FROM article_translations t
WHERE NOT EXISTS (
SELECT 1 FROM article_translations
WHERE article_id = t.article_id AND created_at > t.created_at
)
) AS t
ON t.article_id = a.id
ORDER BY t.created_at DESC
This will work if there are not more than 1 rows in article_translations with the same maximum created_at for an article_id.
For MySql 8.0+ you could use ROW_NUMBER():
SELECT t.* FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY a.id ORDER BY t.created_at DESC) rn
FROM articles AS a INNER JOIN article_translations AS t
ON t.article_id = a.id
) AS t
WHERE t.rn = 1
You are almost there with the query in the question. You just need to add the distinct keyword:
SELECT distinct * from articles
JOIN article_translations as t on t.article_id = articles.id
ORDER BY t.created_at desc`
Thanks forpas for providing the correct answer. I needed this query basically for a Laravel Eloquent model in combination with eager loading. In case anybody cares, that's how the final solution now looks like:
$articles = Article::join('article_translations as at', function ($join) {
$join->on('articles.id', '=', 'at.article_id');
})
->select('articles.id',
'articles.user_id',DB::raw('MAX(at.created_at) as created_at'))
->groupBy('at.article_id')
->orderBy('created_at', 'desc')
->with('translations')
->get();
Pure SQL
SELECT at.article_id, MAX(at.created_at) as created_at FROM articles as a
INNER JOIN article_translations as at
ON a.id = at.article_id
GROUP BY at.article_id
ORDER BY MAX(created_at) desc
Since you do want all the columns:
If you are trying to keep just the article translation with the latest creation date, then assuming the creation dates are unique for a give article's translations, one way would be to create a subquery that computes for each article_translation.article_id the maximum article_translation.created_at column value:
SELECT articles.*, t.* from articles
JOIN article_translations as t on t.article_id = articles.id
JOIN (
SELECT article_id, max(created_at) as created_at from article_translations
GROUP BY article_id
) a_t on t.article_id = a_t.article_id and t.created_at = a_t.created_at
ORDER BY t.created_at desc
If the creation dates are not unique, or even if they are, then this should also work:
SELECT articles.*, t.* from articles
JOIN article_translations as t on t.article_id = articles.id
AND t.article_id = (
SELECT t2.article_id from article_translations t2
WHERE t2.article_id = t.article_id
ORDER BY created_date DESC
LIMIT 1
)
ORDER BY t.created_at DESC

Select distinct to remove duplicate rows?

I have a forum with a Posts and Comments table. I'd like to sort by recent comments:
select distinct(p.id)
,p.title
,c.id
from Posts as p
,Comments as c
where c.post_id = p.id
order by c.id DESC
LIMIT 50;
However I get a row for every comment. I know I want to loop through the most recent comments and grab the first 50 unique posts. I just can't translate that to SQL.
Here's a solution without subqueries:
SELECT p.id, p.title, MAX(c.id) AS comment_id
FROM Posts AS p
JOIN Comments AS c
ON c.post_id = p.id
GROUP BY p.id
ORDER by comment_id DESC
LIMIT 50
This way may be a bit faster and more scalable despite the subquery because it can optimize on the limit clause:
SELECT p.id, p.title, MAX(c.id) AS comment_id
FROM Posts p
JOIN (SELECT DISTINCT c.post_id FROM Comments c ORDER BY c.id DESC LIMIT 50) t
ON t.post_id = p.id
JOIN Comments c
ON c.post_id = p.id
GROUP BY p.id
ORDER BY comment_id DESC
Make sure there is an index on Comments(post_id).
select p.id
,p.title
,c.id
from Posts as p
,Comments as c
where c.post_id in (
select distict (id)
from posts
)
order by c.id desc
limit 50;
Thanks, Gaurav
You can do so ,by getting the max of comment id for each post group and join with your posts table then do order by comments id
select p.id, p.title, c.id
from
Posts as p
JOIN
(select max(id) id ,post_id
from Comments group by
post_id LIMIT 50) c
ON(c.post_id = p.id)
order by c.id DESC;
Note above query will give the recent comment id only for each post group you can't use * in subquery to fetch the entire row for a comment this means this will not give the recent comment for each post if you select all in subquery
Edit this query will use only one join with limit in inner query so only 50 post's recent comment will be joined and its the inner join so it will take care to returned only associated posts,moreover the performance can be clear if you see the explain plan for your quesries

Avoid filesort in query

I have two tables, posts and posts_relationship.
posts:id, title, text, lang, timestamp
posts_relationship: id, post_id, subcategory_id
This is my query:
SELECT posts.title,
posts.timestamp
FROM posts_relationship
INNER JOIN posts ON posts.id = posts_relationship.post_id
WHERE posts.lang = 'it' AND
posts.timestamp <= NOW() AND
posts_relationship.subcategory_id = 21
ORDER BY posts.timestamp DESC
I have added indexes on posts.lang, posts.timestamp, posts_relationships.post_id, and posts_relationship.subcategory_id.
But with explain I have always Temporary or Filesort.
How can I have only "Using where"?
It is unclear to me what this line is doing:
posts.timestamp <= NOW() AND
How do you get a timestamp that is in the future?
Without this line, then the query would look like:
SELECT p.title, p.timestamp
FROM posts_relationship pr INNER JOIN
posts p
ON p.id = pr.post_id
WHERE p.lang = 'it' AND
pr.subcategory_id = 21
ORDER BY p.timestamp DESC;
With this query, an index on posts(lang, timestamp, id) might prevent the filesort. The following version of the query would probably ensure this even more:
SELECT p.title, p.timestamp
FROM posts p
WHERE p.lang = 'it' AND
exists (select 1
from posts_relationship pr
where p.id = pr.post_id and pr.subcategory_id = 21
)
ORDER BY p.timestamp DESC;
I don't think this query will use a file sort with the following two indexes:
posts(lang, id, timestamp)
posts_relationship(post_id, subcategory_id)
(By the way, You can keep the condition on timestamp, I just don't understand why it is necessary.)

How to order by one-to-many association date with a decent performance?

I have the above models in my MySQL database:
Blogs (id: integer, name: varchar)
Posts (id: integer, name: varchar, blog_id: integer, created_at: date)
I want to retrieve a list of all the blogs, ordered by the ones that have the newests posts.
I've reached that with the following query:
SELECT b.*, (SELECT p.created_at FROM posts p WHERE p.blog_id = b.id ORDER BY p.created_at DESC LIMIT 1) AS last_post_created_at FROM blogs b ORDER BY last_post_created_at DESC;
But this query is too slow and I'm unable to use it on my application.
Do you guys have a good solution for that?
Thank you.
A rewriting of the query:
SELECT b.*,
p.last_post_created_at
FROM blogs b
LEFT JOIN
( SELECT blog_id,
MAX(created_at) AS last_post_created_at
FROM posts
GROUP BY blog_id
) AS p
ON p.blog_id = b.id
ORDER BY last_post_created_at DESC;
An index on (blog_id, created_at) will help both this and your version.
If you want to limit the number of blogs returned, you should add the ORDER BY in the subquery and put the LIMIT there:
SELECT b.*,
p.last_post_created_at
FROM blogs b
LEFT JOIN
( SELECT blog_id,
MAX(created_at) AS last_post_created_at
FROM posts
GROUP BY blog_id
ORDER BY last_post_created_at DESC
LIMIT 100
) AS p
ON p.blog_id = b.id
ORDER BY last_post_created_at DESC;
Use a view. Great thing to know and use!

problem in mysql query with join and limit the time

i have this
SELECT COUNT(1) cnt, a.auther_id
FROM `posts` a
LEFT JOIN users u ON a.auther_id = u.id
GROUP BY a.auther_id
ORDER BY cnt DESC
LIMIT 20
its work fine
bu now i want select from posts which added from 1 day tried to use
WHERE from_unixtime(post_time) >= SUBDATE(NOW(),1)
but its didnot worked
any one have idea
My guess is that you added the WHERE clause in the wrong place. It should come after the JOIN but before the GROUP BY, like this:
SELECT COUNT(1) cnt, a.auther_id
FROM `posts` a
LEFT JOIN users u ON a.auther_id = u.id
WHERE from_unixtime(post_time) >= SUBDATE(NOW(),1)
GROUP BY a.auther_id
ORDER BY cnt DESC
LIMIT 20