How to apply ORDER to GROUP BY [duplicate] - mysql

This question already has answers here:
Retrieving the last record in each group - MySQL
(33 answers)
Closed 2 years ago.
I have the article table, each having a related category (or NULL if not categorized - these articles don't interest me in this case).
I want to get 8 newest articles each from one category in a way that there is always only one article for one category (i.e. never 2 articles from the same category).
This query almost work:
SELECT article.id, article.title, category_container.title FROM `article`
JOIN `category_container` ON category_container.id = article.category_id
WHERE `category_id` IS NOT NULL GROUP BY `category_id` ORDER BY article.created_at DESC LIMIT 8
The problem is, ORDER doesn't work. I want newest, it returns me the earliest ones (with the smallest id, while it should be the opposite).
So, how to apply ORDER to GROUP BY?

You don't. Your query is broken -- and in the latest versions of MySQL you would properly get an error.
Instead use a filtering method to get the latest rows. Here is a method using a correlated subquery:
SELECT a.id, a.title, cc.title
FROM article a JOIN
category_container cc
ON cc.id = a.category_id
WHERE a.created_at = (SELECT MAX(a2.created_at)
FROM article a2
WHERE a2.category_id = a.category_id
)
ORDER BY a.created_at DESC
LIMIT 8;
For performance, you want an index on article(category_id, created_at).
In the more recent versions, this would be even simpler using ROW_NUMBER():
SELECT a.id, a.title, cc.title
FROM (SELECT a.*,
ROW_NUMBER() OVER (PARTITION BY a.category_id ORDER BY a.created_at DESC) as seqnum
FROM article a
) a JOIN
category_container cc
ON cc.id = a.category_id
WHERE seqnum = 1;
Given that your query runs with no errors, you are probably using an older version of MySQL and this won't work.

Related

Avoiding the “n+1 selects” problem when I want a subset of related resources

Imagine I'm designing a multi-user blog and I have user, post, and comment tables with the obvious meanings. On the main page, I want to show the ten most recent posts along with all their related comments.
The naive approach would be to SELECT the ten most recent posts, probably JOINed with the users that wrote them. And then I can loop through them to SELECT the comments, again, probably JOINed with the users that wrote them. This would require 11 selects: 1 for the posts and 10 for their comments, hence the name of the famous anti-pattern: n+1 selects.
The usual advice for avoiding this anti-pattern is to use the IDs from the first query to fetch all related comments in a second query which may look something like this:
SELECT
*
FROM
comments
WHERE
post_id IN (/* A comma separated list of post IDs returned from the first query */)
As long as that comma separated list is in reasonably short we managed to fetch all the data we need by issuing only two SELECT queries instead of eleven. Great.
But what if I only want the top three comments for each post? I didn't try but I can probably come up with some LEFT JOIN trickery to fetch the most recent posts along with their top three comments in a single query but I'm not sure it would be scalable. What if I want the top hundred comments which would exceed the join limit of 61 tables of a typical MySQL installation for instance?
What is the usual solution for this other than reverting to n+1 selects anti-pattern? What is the most efficient way to fetch items with a subset of items related to each one in this fairly typical scenario?
It is usually a better option to run as few queries as possible, and then implement some application logic on top of it if needed. In your use case, I would build a query that returns both the most recent posts and the most recent associated comments, with proper ordering to make the application processing easier. Then your application can take care of displaying them.
Assuming that you use MySQL (since you mentionned it in your question), let's start with a query that gives you the 10 most recent posts:
SELECT * FROM posts ORDER BY post_date DESC LIMIT 10
Then you can join this with the corresponding comments:
SELECT
p.*,
c.*
FROM
(SELECT * FROM posts ORDER BY post_date DESC LIMIT 10) p
INNER JOIN comments c ON c.post_id = p.id
Finally, let's set up a limit on the number of comments per posts. For this, you can use ROW_NUMBER() (available in MySQL 8.0) to rank the comments per post, and then filter only the a given number of comments. This gives you the 10 most recent posts along with each of their 3 most recents comments:
SELECT *
FROM (
SELECT
p.*,
c.*,
ROW_NUMBER() OVER(PARTITION BY p.post_id ORDER BY c.comment_date DESC) rn
FROM
(SELECT * FROM posts ORDER BY post_date DESC LIMIT 10) p
INNER JOIN comments c ON c.post_id = p.id
) x
WHERE rn <= 3
ORDER BY p.post_date DESC, c.comment_date DESC
Query results are ordered by post, then by comment date. So when your application fetches the resuts, you get 1 to 3 records per post, in sequence.
If you want the last 10 posts
SELECT p.post_id
FROM post p
ORDER BY p.publish_date DESC
LIMIT 10
Now if you want the comment of those posts:
SELECT c.comment_id, u.name
FROM comments c
JOIN users u
on c.user_id = u.user_id
WHERE c.post_id IN ( SELECT p.post_id
FROM post p
ORDER BY p.publish_date DESC
LIMIT 10 )
Now for the last 3 comments is where rdbms version is important so you can use row_number or not:
SELECT *
FROM (
SELECT c.comment_id, u.name,
row_number() over (partition by c.post_id order by c.comment_date DESC) as rn
FROM comments c
JOIN users u
on c.user_id = u.user_id
WHERE c.post_id IN ( SELECT p.post_id
FROM post p
ORDER BY p.publish_date DESC
LIMIT 10 )
) x
WHERE x.rn <= 3
You can do this in one query:
select . . . -- whatever columns you want here
from (select p.*
from posts p
order by <datecol> desc
fetch first 10 rows only
) p join
users u
on p.user_id = u.user_id join
comments c
on c.post_id = p.post_id;
This returns the posts/users/comments in one table, mixing the columns. But it only requires one query.

MySql sort on join? [duplicate]

This question already has answers here:
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 6 years ago.
I have two tables jobs, notes.
I want to output a list of jobs that are of status='lost' along with the most recent note for that job (based on the date the note was created).
Here's my query:
select jobs.id, jobs.name, jobs.status
inner join notes on jobs.id=notes.jobId
where jobs.status='lost'
group by jobs.id
order by notes.createDate DESC
I would have thought that the output would show the most recent note for a given job. But it shows the first note for that job. I have changed sort from DESC to ASC just to see what happens...and the output is the same.
Then I tried to nest a select from notes inside the main select..and it hung.
This should be easy and I am sure it is..what am I missing ?
There is many options to solve this, but you may use a sub query.
select jobs.id, jobs.name, jobs.status
(select noteField from notes on jobs.id=notes.jobId order by createDate desc limit 1) note
where jobs.status='lost'
When I'm in a similar boat, I've resorted to using a subquery on the join:
select jobs.id, jobs.name, jobs.status
from jobs
inner join notes on jobs.id = notes.jobId
and notes.createDate = (select max(notes.createDate)
from notes
where jobs.id = notes.createDate
group by notes.jobId)
where jobs.status='lost'
group by jobs.id
order by notes.createDate DESC

Different result query when use mysql and mariadb

Here is my problem:
My database have table Book, Post. Each book has many post
Table posts has field 'book_id', that is foreign key reference table Book primary key (id).
This is my index page. The idea is to get latest post from each book and order by published date.
When I code on localhost, every thing is OK. I can get latest post from each book and order by publish date. But when I deploy it in vps. It didn't get latest post, it get first post from each book. I didn't have any experience about it. Please help, thanks
On localhost, I use: Apache-2.2, PHP-5.3, Mysql-5.5, ENGINE type for table is InnoDB.
On VPS, I use: Nginx 1.7.6, PHP-FPM 5.5.18, MariaDB, ENGINE type for table is MyIsam
I guest the problem is InnoDB and MyIsam, I try to fix it. But, if you have free time, please give me some good advise. Thanks a lot
p/s: Sorry about my poor english
SELECT * FROM `my_book_store`.`books`
AS `Book`
INNER JOIN
(
SELECT *
FROM posts
WHERE posts.published = 1 AND posts.published_date <= NOW()
ORDER BY posts.published_date DESC
) AS `Post`
ON (`Post`.`book_id` = `Book`.`id`)
WHERE 1 = 1
GROUP BY `Book`.`id`
ORDER BY `Post`.`published_date` desc
LIMIT 100
You can try the below queries which does the job of getting the last post from each book
select
b.id,
b.name,
p.content,
p.published_date
from book b
join post p on p.book_id = b.id
left join post p1 on p1.book_id = p.book_id and p1.published_date > p.published_date
where p1.id is null;
OR
select
b.id,
b.name,
p.content,
p.published_date
from book b
join post p on p.book_id = b.id
where not exists(
select 1 from post p1
where p.book_id = p1.book_id
and p1.published_date > p.published_date
)
DEMO
Try this:
SELECT b.*, p.*
FROM my_book_store.books AS b
INNER JOIN posts p ON b.id = p.book_id
INNER JOIN (SELECT p.book_id, MAX(p.published_date) published_date
FROM posts p
WHERE posts.published = 1 AND posts.published_date <= NOW()
GROUP BY p.book_id
) AS p1 ON p.book_id = p1.book_id AND p.published_date = p1.published_date
GROUP BY b.id
ORDER BY p.published_date DESC
LIMIT 100
The problem seems to be that you're only grouping by
Book.id but select a lot of other non-aggregated values,
so actual query results depend on the execution plan the
optimizer came up with. See also
MySQL extends the use of GROUP BY so that the select list can
refer to nonaggregated columns not named in the GROUP BY clause.
[...]
However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for
each group.
THE SERVER IS FREE TO CHOOSE ANY VALUE from each
group, so unless they are the same, the values chosen are
indeterminate.
Furthermore, the selection of values from each
group cannot be influenced by adding an ORDER BY clause.
Different result query when use mysql and mariadb

MySQL - COUNT and retrieve n rows from a subquery

Context:
I have an app that shows posts and comments on the home page.
My intention is to limit the number of posts shown (ie, 10 posts) and...
Limit the number of comments shown per post (ie, 2 comments).
Show the total number of comments in the front end (ie, "read all 10 comments")
MySQL:
(SELECT *
FROM (SELECT *
FROM post
ORDER BY post_timestamp DESC
LIMIT 0, 10) AS p
JOIN user_profiles
ON user_id = p.post_author_id
LEFT JOIN (SELECT *
FROM data
JOIN pts
ON pts_id = pts_id_fk) AS d
ON d.data_id = p.data_id_fk
LEFT JOIN (SELECT *
FROM comment
JOIN user_profiles
ON user_id = comment_author_id
ORDER BY comment_id ASC) AS c
ON p.post_id = c.post_id_fk))
I've failed to insert LIMIT and COUNT in this code to get what I want - any suggestions? - will be glad to post more info if needed.
If I'm understanding you correctly you want no more than 10 posts (and 2 comments) to come back for each unique user in the returned result set.
This is very easy in SQLServer / Oracle / Postgre using a "row_number() PARTITION BY".
Unfortunately there is no such function in MySql. Similar question has been asked here:
ROW_NUMBER() in MySQL
I'm sorry I can't offer a more specific solution for MySql. Definitely further research "row number partition by" equivalents for MySql.
The essence of what this does:
You can add a set of columns that make up a unique set, say user id for example sake (this is the "partition") A "row number" column is then added to each row that matches the partition and starts over when it changes.
This should illustrate:
user_id row_number
1 1
1 2
1 3
2 1
2 2
You can then add an outer query that says: select where row_number <= 10, which can be used in your case to limit to no more than 10 posts. Using the max row_number for that user to determine the "read all 10 comments" part.
Good luck!
This is the skeleton of the query you're looking for:
select * from (
select p1.id from posts p1
join posts p2 on p1.id <= p2.id
group by p1.id
having count(*) <= 3
order by p1.post_timestamp desc
) p left join (
select c1.id, c2.post_id from comments c1
join comments c2 on c1.id <= c2.id and c1.post_id = c2.post_id
group by c1.id
having count(*) <= 2
order by c1.comment_timestamp desc
) c
on p.id = c.post_id
It will get posts ordered by their descending timestamp but only the top 3 of them. That result will be joined with the top 2 comments of each post order by their descending timestamp. Just change the column names and it will work :)

MySQL GROUP BY and SORT BY with JOINS

I got 3 tables:
items (item_id, timestamp)
items_terms (item_id, term_id)
terms (term_id, term_name)
I need to find 5 most recent terms (term_id, term_name) based on item timestamp. I was trying to solve it like this:
SELECT t.term_id, t.term_name
FROM items i
INNER JOIN items_terms it USING(item_id)
INNER JOIN terms t USING (term_id)
GROUP BY t.term_id
ORDER BY i.timestamp DESC
LIMIT 5
But the problem is that MySQL will group items first (it will take the first term_id) and disregard ORDER BY..
I was also thinking about filtering on PHP side by removing GROUP BY and selecting more than 5 items, but this query needs to support pagination without duplicates on consecutive pages.
Will be glad to see any suggestions.
How about including the timestamp in the select statement:
SELECT t.term_id, t.term_name, MAX(i.timestamp)
FROM items i
INNER JOIN items_terms it USING(item_id)
INNER JOIN terms t USING (term_id)
GROUP BY t.term_id, t.term_name
ORDER BY MAX(i.timestamp) DESC
LIMIT 5
I would suggest reading this article as in MySQL there are several techniques to limit rows from groups in GROUP BY select and few might suit your needs. Generally using HAVING directive with query "global" variables should be preferred as it operates on already grouped result set which positively affects performance.
EDIT: Solution would be:
SELECT DISTINCT
t.term_id,
t.term_name
FROM
items i
INNER JOIN items_terms it USING(item_id)
INNER JOIN terms t USING (term_id)
ORDER BY
i.timestamp DESC
LIMIT 5
SELECT DISTINCT term_id,
DISTINCT term_name
FROM(select t.term_id, t.term_name,i.timestamp FROM items i INNER JOIN items_terms it
on i.item_id=it.item_id
INNER JOIN terms t
on it.term_id=t.term_id
GROUP BY t.term_id
)
ORDER BY i.timestamp DESC
LIMIT 5