Get top N rows of each group in MySQL

Get top N rows of each group in MySQL - mysql

Given a MySQL table of the form
Name | Type
-------+-----
Bill | A
Hill | B
Jill | C
Hans | A
George | C
Sophie | B
Hannah | B
Nancy | C
Phil | A
... | ...
I would like to produce a MySQL query which provides me with the top N rows grouped by their type. By 'top' I mean with respect to a given ordering. In this example, it could be the order given by ordering the type parameters alphabetically (or by date, if all type parameters are dates). For instance, if N = 2, then the resulting table could be:
Name | Type
-------+-----
Bill | A
Hill | B
Jill | C
Hans | A
George | C
Sophie | B
... | ...
That is, the entries may very well be grouped into their respective types in the resulting tables, but it is not strictly important that they are. I run MySQL 8.x.

If you want n rows per group, use row_number(). If you then want them interleaved, use order by:
select t.*
from (select t.*,
row_number() over (partition by type order by name) as seqnum
from t
) t
where seqnum <= 2
order by seqnum, type;
This assumes that "top" is alphabetically by name. If you have another definition, use that for the order by for row_number().

Related

MySQL: Error: Operand should contain 1 column(s). What's wrong with my use of WHERE...NOT IN(SELECT...)?

Credit:Leetcode_1355. Activity Participants
Question:
Write an SQL query to find the names of all the activities with neither maximum, nor minimum number of participants.
Return the result table in any order. Each activity in table Activities is performed by any person in the table Friends.
Friends table:
+------+--------------+---------------+
| id | name | activity |
+------+--------------+---------------+
| 1 | Jonathan D. | Eating |
| 2 | Jade W. | Singing |
| 3 | Victor J. | Singing |
| 4 | Elvis Q. | Eating |
| 5 | Daniel A. | Eating |
| 6 | Bob B. | Horse Riding |
+------+--------------+---------------+
Activities table:
+------+--------------+
| id | name |
+------+--------------+
| 1 | Eating |
| 2 | Singing |
| 3 | Horse Riding |
+------+--------------+
Result table:
+--------------+
| activity |
+--------------+
| Singing |
+--------------+
My code is as follows:
WITH a AS(
SELECT activity, COUNT(1) AS n
FROM Friends
GROUP BY activity
)
SELECT activity
FROM a
WHERE n NOT IN (SELECT MAX(n),MIN(n) FROM a)
I have seen the success of using n != (select min(n) from a) and n != (select max(n) from a), but I did not know why my code went wrong. My guess is that it's because 'SELECT MAX(n), MIN(n) FROM a' will generate two columns, rather than two rows. While I still don't know the exact reason.
Hope someone can help me out! Thank you so much!

You are close. But NOT IN does work that way -- because the subquery returns multiple columns. And you are comparing to only one value. Instead, use two separate comparisons:
SELECT activity
FROM a
WHERE n <> (SELECT MAX(n) FROM a) AND
n <> (SELECT MIN(n) FROM a) ;

My guess is that it's because SELECT MAX(n), MIN(n) FROM a will generate two columns, rather than two rows.
Yes, that's the point. Other than using two subqueries (which you already found out by yourself), you can also take advantage of window functions here (the fact that you use a with clause indicates that you are running MySQL 8.0, which supports window functions):
select activity
from (
select
activity,
row_number() over(order by count(*) asc) rn_asc,
row_number() over(order by count(*) desc) rn_desc
from friends
group by activity
) t
where 1 not in (rn_asc, rn_desc)
I suspect that this performs better than a with clause and two subqueries.

Instead of using the subquery in WHERE, you can join with the subquery.
WITH a AS(
SELECT activity, COUNT(1) AS n
FROM Friends
GROUP BY activity
)
SELECT activity
FROM a AS a1
JOIN (SELECT MAX(n) AS maxn, MIN(n) AS minn) AS a2
ON a1.n NOT IN (a2.maxn, a2.minn)

You can use MIN() and MAX() window functions:
WITH cte AS (
SELECT activity,
COUNT(*) AS n,
MIN(COUNT(*)) OVER () min_n,
MAX(COUNT(*)) OVER () max_n
FROM Friends
GROUP BY activity
)
SELECT activity, n
FROM cte
WHERE n NOT IN (min_n, max_n)
See the demo.
Results:
| activity | n |
| -------- | --- |
| Singing | 2 |

MySQL SUM by field an then GROUP BY user?

I have this table:
What's the correct query to get this result:
ID | Type | Total
1 | A | 300
1 | B | 100
2 | A | 30
2 | B | 40
Which means sum by type first and then group by user id?

Your aggregate functions, such as SUM, are performed on your groups; so there is no "SUM by type first then group by user_id", you are wanting to group by user_id and type.
Like so: GROUP BY user_id, type
If you want to guarantee that ordering in the future also have an ORDER BY user_id, type clause as well. Currently, GROUP BY also orders, but I believe that feature has been marked as deprecated recently.

MySQL Trying to Return Staggered Results

For example I have a table with three fields:
id (int)
name (varchar)
company (int)
Let's say that I have the following data (example only)
id --- name --- company
---------------------------------------
1 --- John Baker --- 1
2 --- Ann Johnson --- 1
3 --- John Wu --- 1
4 --- Mike Johns --- 2
5 --- John John --- 2
6 --- Johnny Boy --- 2
I would like perform a search on name, and return the data staggered by company. So if I perform a search on LIKE '%John%' , I wish to return the data in a way where it is sorted by company like: 1, 2, 1, 2, 1, 2 whilst maintaining as much relevancy in return order to the original search term as possible.
I have no idea how to return the data in this staggered way, and I have thought about it for hours. If somebody can please help me I'd love to hear their ideas!

This would be rather easy if we could use SQL standard functions to ROW_NUMBER each name PARTITIONed BY company. (ROW_NUMBER gives us a "naive ranking" or simple numbering, with no ties and no gaps.) This is essentially #zebediah49's proposal in his comment. MySQL, sadly, cannot today do this.
#GordonLindoff's answer simulates this functionality in MySQL with a self-join technique. Here's another way to do the same.
First we group every person by company and then globally naively rank them, using a user variable. So, the three people in company A become #1, #2, and #3, and the two people in company B become #4 and #5, and so on:
company | name == ROW_NUMBER()d after ==> company | name | rank
--------+-------- == GROUP BY `company` ==> --------+---------+------
A | Alice A | Alice | 1
B | Bob A | Charlie | 2
A | Charlie A | Deborah | 3
A | Deborah B | Bob | 4
B | Erwin B | Erwin | 5
The user variable technique to simulate ROW_NUMBER in MySQL is easy to search for, but here's a compact demonstration from another SO answer.
Now if we MOD a global naive rank by the number of people within a company, we get a "partitioned rank", a relative rank within the company:
company | name | rank | npeople | rank % npeople
--------+----------+------+---------+----------------
A | Alice | 1 | 3 | 1
A | Charlie | 2 | 3 | 2
A | Deborah | 3 | 3 | 0
B | Bob | 4 | 2 | 0
B | Erwin | 5 | 2 | 1
Putting it all together, JOINing against a query to count the number of people in each company, we get:
SELECT id, name, ranked.company
FROM ( SELECT tbl.id, tbl.name, tbl.company, (#rn := #rn + 1) AS rn
FROM tbl
JOIN (SELECT #rn := 0) vars
WHERE tbl.name LIKE '%John%'
ORDER BY company) ranked
JOIN (SELECT company, COUNT(id) AS npeople FROM tbl GROUP BY company) companies
ON ranked.company = companies.company
ORDER BY rn MOD companies.npeople, company

If you want it sorted by company:
select *
from t
where . . .
order by company, id;
If you want it interleaved, then a counter within a company helps. Here is one way:
select t.*
from (select *,
(select count(*) from t t2 where <where clause on t2 here> and t2.comapny = t.company and t2.id < t.id) as seqnum
from t
where . . .
) t
order by seqnum, company

One possible solution:
select *
from yourTable
where ...
order by ((company*1000) + id);
Add as much zeros as you need. At least, you'll need as much zeros as this number:
select pow(10,length(max(company))) from yourTable;
The ordering may be quite slow if you pull a lot of records in this query, so I suggest you use an optimal where condition.
Not the most elegant solution, but it may work.

Show all grouped results and sort

I have a table, like that one:
| B | 1 |
| C | 2 |
| B | 2 |
| A | 2 |
| C | 3 |
| A | 2 |
I would like to fetch it, but sorted and grouped. That is, I would like it grouped by the letter, but sorted by the highest sum of the group. Also, I want to show all entries within the group:
| C | 3 |
| C | 2 |
| A | 2 |
| A | 2 |
| B | 2 |
| B | 1 |
The order is that way because C has 3 and 2. 3+2=5, which is higher than 2+2=4 for A which in turn is higher than 2+1=3 for B.
I need to show all "grouped" letters because there are other columns that are distinct all of which I need shown.
EDIT:
Thanks for the quick reply. I have the audacity, however, to inquire further.
I have this query:
SELECT * FROM `ip_log` WHERE `IP` IN
(SELECT `IP` FROM `ip_log` GROUP BY `IP` HAVING COUNT(DISTINCT `uid`) > 1)
GROUP BY `uid` ORDER BY `IP`
The letters in the upper description are ip (I need it grouped by the IP addresses) and the numbers are timestamp (I need it sorted by the sum (or just used as the sorting parameter)). Should I create a temporary table and then use the solution below?

select t.Letter, t.Value
from MyTable t
inner join (
select Letter, sum(Value) as ValueSum
from MyTable
group by Letter
) ts on t.Letter = ts.Letter
order by ts.ValueSum desc, t.Letter, t.Value desc
SQL Fiddle Example

If your table's columns are letter and number, the way I would go around to doing this would be the following:
SELECT
letter,
GROUP_CONCAT(number ORDER BY number DESC),
SUM(number) AS total
FROM table
GROUP BY letter
ORDER BY total desc
What you will get, based on your example is the following:
| C | 3,2 | 5
| A | 2,2 | 4
| B | 2,1 | 3
You can then process that data to get the actual information you want/need.
If you still want the data in the format you requested originally, it is not possible with a single query. The reason for that is that you can't sort based on an aggregated data that you are not calculating in the same query (the SUM of the number column). So you will need to make a sub-query to calculate that and feed it back into the original query (disclaimer: untested query):
SELECT
letter,
number
FROM table
JOIN (SELECT ltr, SUM(number) AS total FROM table GROUP BY letter) AS totals
ON table.letter = totals.ltr
ORDER BY totals.total desc, letter desc, number desc

How to retrieve corresponding value of a filed with MAX on other fields in MySQL 4.0.x

To start with, I am using MySQL 4.0.27 and need solution for this
version only.
I am using MAX() in SELECT statement with other fields and need to retrieve the value of other fields which is corresponding to the value of MAX field.
Assume below data from table Orders:
--------------------------------------------------------------
Product | CategoryID | Date | OrderBy
--------------------------------------------------------------
TV | 1 | 2011-11-27 | John
Pen | 1 | 2011-11-29 | David
Mouse | 2 | 2011-11-30 | Mike
Printer | 1 | 2011-10-19 | Rozi
HDD | 2 | 2011-11-02 | Peter
----------------------------------------------------------------
My requirement is to retrieve count of orders in each category with name of individuals with recent Order, which means I need following result:
--------------------------------------------------------------------------
CategoryID | OrderBy | Order_Count | Date
-------------------------------------------------------------------------
1 | John | 3 | 2011-11-29
2 | Peter | 2 | 2011-11-30
If I use below SQL:
SELECT CategoryID, OrderBy, COUNT(OrderID) AS Order_count, MAX(Date)
FROM Orders
GROUP BY CategoryID
I am not getting desired result. I am getting some other name in OrderBy instead of the same name which is falling against extracted date.
Can anyone suggest how to achieve this in MySQL 4.0.x where we have limitation of not using inner query or functions like GROUP_CONCAT.
Thanks in advance.

Try:
SELECT CategoryID, OrderBy, COUNT(OrderID) AS Order_count, Date
FROM Orders
GROUP BY CategoryID
ORDER BY Date Desc
- assuming you want the OrderBy value corresponding to the maximum date.

Try:
SELECT a.CategoryID, b.OrderBy, COUNT(DISTINCT a.id), MAX(a.Date)
FROM Orders a
INNER JOIN Orders b ON a.CategoryID = b.CategoryID
GROUP BY a.CategoryID
ORDER BY a.Date DESC,b.Date ASC

I always find it a great help to check in the manual (although, despite what it says there, if the value to find the MAX for is not indexed, then this is more efficient than a sub-select)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Get top N rows of each group in MySQL - mysql

Related

MySQL: Error: Operand should contain 1 column(s). What's wrong with my use of WHERE...NOT IN(SELECT...)?

MySQL SUM by field an then GROUP BY user?

MySQL Trying to Return Staggered Results

Show all grouped results and sort

How to retrieve corresponding value of a filed with MAX on other fields in MySQL 4.0.x

Categories

Resources