Get top n counted rows in table within group [MySQL] - mysql

I'm trying to get just top 3 selling products grouped within categories (just top 3 products by occurrence in transactions (id) count(id) by each category). I was searching a lot for possible solution but with no result. It looks like it is a bit tricky in MySQL since one can't simply use top() function and so on. Sample data structure bellow:
+--------+------------+-----------+
| id |category_id | product_id|
+--------+------------+-----------+
| 1 | 10 | 32 |
| 2 | 10 | 34 |
| 3 | 10 | 32 |
| 4 | 10 | 21 |
| 5 | 10 | 100 |
| 6 | 7 | 101 |
| 7 | 7 | 39 |
| 8 | 7 | 41 |
| 9 | 7 | 39 |
+--------+------------+-----------+

In earlier versions of MySQL, I would recommend using variables:
select cp.*
from (select cp.*,
(#rn := if(#c = category_id, #rn + 1,
if(#c := category_id, 1, 1)
)
) as rn
from (select category_id, product_id, count(*) as cnt
from mytable
group by category_id, product_id
order by category_id, count(*) desc
) cp cross join
(select #c := -1, #rn := 0) params
) cp
where rn <= 3;

If you are running MySQL 8.0, you can use window function rank() for this:
select *
from (
select
category_id,
product_id,
count(*) cnt,
rank() over(partition by category_id order by count(*) desc) rn
from mytable
group by category_id, product_id
) t
where rn <= 3
In earlier versions, one option is to filter with a correlated subquery:
select
category_id,
product_id,
count(*) cnt
from mytable t
group by category_id, product_id
having count(*) >= (
select count(*)
from mytable t1
where t1.category_id = t.category_id and t1.product_id = t.product_id
order by count(*) desc
limit 3, 1
)

Related

Count Ocurrence of Field whilst being Group By 2

I'm struggling to try to have the count of order id on an item_id row, any help is greatly appreciated!
Data
item_id | order_id
1 | Order_1
2 | Order_1
3 | Order_2
4 | Order_3
Desired Result
item_id | order_id | items_in_order
1 | Order_1 | 2
2 | Order_1 | 2
3 | Order_2 | 1
4 | Order_3 | 1
SELECT S.item_id, S.`order_id`, S.order_total, C.cnt as items_in_order,
`order_discount` / C.cnt as item_discount,
`order_total` / C.cnt as item_price
FROM `orders` S
LEFT JOIN (SELECT `item_id`, `order_id`, count(`order_id`) as cnt FROM `supplier_orders` GROUP BY `order_id`)
C ON S.`order_id` = C.`order_id` AND S.id = C.item_id
This would produce this with null values
item_id | order_id | items_in_order | item_discount | item_price
3009117 | 3029511 | 2 | 0 | 25
3009118 | 3029511 | null | null | null
UPDATE, this now seems to work as intended
SELECT S.`item_id`, S.`order_id`, S.order_total, C.cnt as items_in_order,
`order_discount` / C.cnt as item_discount,
`order_total` / C.cnt as item_price
FROM `orders` S
INNER JOIN (SELECT `item_id`, `order_id`, count(`order_id`) as cnt FROM `orders` GROUP BY `order_id`)
C ON S.`order_id` = C.`order_id`
GROUP BY S.`item_id`
Your query does not relate to your sample data; however you seem to want aggregation and ranking. In MySQL 8.0, you would do:
select
row_number() over(order by count(*) desc) rn,
order_id,
count(*) items_in_order
from data
group by order_id
order by rn
I named the first column rn (for rank): I find id confusing here, since you already have a column with that name in the table.
In earlier versions, one option uses a session variable instead of row_number():
select #rn := #rn + 1 rn, order_id, items_in_order
from (
select order_id, count(*) items_in_order
from data
group by order_id
order by items_in_order desc
) t
cross join (select #rn := 0) r
order by items_in_order desc

How to reference generated/aliased table in same query?

I want to find a user's position in a leaderboard and return the 4 users above and 4 users below their position.
My table, 'predictions', looks something like this:
+----+---------+--------+-------+---------+
| id | userId | score | rank | gameId |
+----+---------+--------+-------+---------+
| 1 | 12 | 11 | 1 | 18 |
| 2 | 1 | 6 | 4 | 18 |
| 3 | 43 | 7 | 3 | 12 |
| 4 | 4 | 9 | 2 | 18 |
| 5 | 98 | 2 | 5 | 19 |
| 6 | 3 | 0 | 6 | 18 |
+----+---------+--------+-------+---------+
Obviously this isn't properly ordered, so I run this:
SELECT l.userId,
l.rank,
l.score,
l.createdAt,
#curRow := #curRow + 1 AS row_number
FROM (SELECT * FROM `predictions` WHERE gameId = 18) l
JOIN (SELECT #curRow := 0) r
ORDER BY rank ASC
which gets me a nice table with each entry numbered.
I then want to search this generated table, find the row_number where userId = X, and then return the values 'around' that result.
I think I have the logic of the query down, I just can't work out how to reference the table 'generated' by the above query.
It would be something like this:
SELECT *
FROM (
SELECT l.userId,
l.rank,
l.score,
l.createdAt,
#curRow := #curRow + 1 AS row_number
FROM (SELECT * FROM `predictions` WHERE gameId = 18) l
JOIN (SELECT #curRow := 0) r
ORDER BY rank ASC) generated_ordered_table
WHERE row_number < (SELECT row_number FROM generated_ordered_table WHERE userId = 1)
ORDER BY row_number DESC
LIMIT 0,5
This fails. What I'm trying to do is to generate my first table with the correct query, give it an alias of generated_ordered_table, and then reference this 'table' later on in this query.
How do I do this?
MySQL version 8+ could have allowed the usage of Window functions, and Common Table Expressions (CTEs); which would have simplified the query quite a bit.
Now, in the older versions (your case), the "Generated Rank Table" (Derived Table) cannot be referenced again in a subquery inside the WHERE clause. One way would be to do the same thing twice (select clause to get generated table) again inside the subquery, but that would be relatively inefficient.
So, another approach can be to use Temporary Tables. We create a temp table first storing the ranks. And, then reference that temp table to get results accordingly:
CREATE TEMPORARY TABLE IF NOT EXISTS gen_rank_tbl AS
(SELECT l.userId,
l.rank,
l.score,
l.createdAt,
#curRow := #curRow + 1 AS row_number
FROM (SELECT * FROM `predictions` WHERE gameId = 18) l
JOIN (SELECT #curRow := 0) r
ORDER BY rank ASC)
Now, you can reference this temp table to get the desired results:
SELECT *
FROM gen_rank_tbl
WHERE row_number < (SELECT row_number FROM gen_rank_tbl WHERE userId = 1)
ORDER BY row_number DESC
LIMIT 0,5
You could use a bunch of unions
select userid,rank,'eq'
from t where gameid = 18 and userid = 1
union
(
select userid,rank,'lt'
from t
where gameid = 18 and rank < (select rank from t t1 where t1.userid = 1 and t1.gameid = t.gameid)
order by rank desc limit 4
)
union
(
select userid,rank,'gt'
from t
where gameid = 18 and rank > (select rank from t t1 where t1.userid = 1 and t1.gameid = t.gameid)
order by rank desc limit 4
);
+--------+------+----+
| userid | rank | eq |
+--------+------+----+
| 1 | 4 | eq |
| 4 | 2 | lt |
| 12 | 1 | lt |
| 3 | 6 | gt |
+--------+------+----+
4 rows in set (0.04 sec)
But it's not pretty
You can use two derived tables:
SELECT p.*,
(#user_curRow = CASE WHEN user_id = #x THEN rn END) as user_rn
FROM (SELECT p.*, #curRow := #curRow + 1 AS rn
FROM (SELECT p.*
FROM predictions p
WHERE p.gameId = 18
ORDER BY rank ASC
) p CROSS JOIN
(SELECT #curRow := 0, #user_curRow := -1) params
) p
HAVING rn BETWEEN #user_curRow - 4 AND #user_currow + 4;

Get user's highest score from a table

I have a feeling this is a very simple question but maybe i'm having brain fart right now and just can't seem to figure out how to go about it.
I have a MySQL table structure like below
+---------------------------------------------------+
| id | date | score | speed | user_id |
+---------------------------------------------------+
| 1 | 2016-11-17 | 2 | 133291 | 17 |
| 2 | 2016-11-17 | 6 | 82247 | 17 |
| 3 | 2016-11-17 | 6 | 21852 | 17 |
| 4 | 2016-11-17 | 1 | 109338 | 17 |
| 5 | 2016-11-17 | 7 | 64762 | 61 |
| 6 | 2016-11-17 | 8 | 49434 | 61 |
Now i can get a particular user's best performance by doing this
SELECT *
FROM performance
WHERE user_id = 17 AND date = '2016-11-17'
ORDER BY score desc,speed asc LIMIT 1
This should return the row with ID = 3. Now what I want is a single query to run to be able to return that 1 such row for each unique user_id in the table. So the resulting result would be something like this
+---------------------------------------------------+
| id | date | score | speed | user_id |
+---------------------------------------------------+
| 3 | 2016-11-17 | 6 | 21852 | 17 |
| 6 | 2016-11-17 | 8 | 49434 | 61 |
Also further more, can I have another question within this same query that would further sort this eventual resulting table by the same criteria of sort (score desc, speed asc). Thanks
A simple method uses a correlated subquery:
select p.*
from performance p
where p.date = '2016-11-17' and
p.id = (select p2.id
from performance p2
where p2.user_id = p.user_id and p2.date = p.date
order by score desc, speed asc
limit 1
);
This should be able to take advantage of an index on performance(date, user_id, score, speed).
Is easy using variable to emulate row_number() over (partition by Order by)
Explanation:
First create two variables in the subquery.
Order by user_id so when user change the #rn reset to 1
Order by score desc, speed asc so each row will have a row_number, and the one you want always will have rn = 1
#rn := you change #rn for each row
if you have a new user_id then #rn is set to 1
otherwise #rn is set to #rn+1
SQL Fiddle Demo
SELECT `id`, `date`, `score`, `speed`, `user_id`
FROM (
SELECT *,
#rn := if(#user_id = `user_id`,
#rn + 1 ,
if(#user_id := `user_id`,1,1)
) as rn
FROM Table1
CROSS JOIN (SELECT #user_id := 0, #rn := 0) as param
WHERE date = '2016-11-17'
ORDER BY `user_id`, `score` desc, `speed` asc
) T
where T.rn =1
OUTPUT
For mysql
You can try with a double in subselect and group by
select * from performance
where (user_id, score,speed ) in (
SELECT user_id, max_score, max(speed)
FROM performance
WHERE (user_id, score) in (select user_id, max(score) max_score
from performance
group by user_id)
group by user_id, max_score
);

filter by row number id of specific item id

From MySQL - Get row number on select
I know how to get the row number / rank using this mysql query:
SELECT #rn:=#rn+1 AS rank, itemID
FROM (
SELECT itemID
FROM orders
ORDER BY somecriteria DESC
) t1, (SELECT #rn:=0) t2;
The result returns something like this:
+--------+------+
| rank | itemID |
+--------+------+
| 1 | 265 |
| 2 | 135 |
| 3 | 36 |
| 4 | 145 |
| 5 | 123 |
| 6 | 342 |
| 7 | 111 |
+--------+------+
My question is: How can I get the result in 1 simple SINGLE QUERY that returns items having lower rank than itemID of 145, i.e.:
+--------+------+
| rank | itemID |
+--------+------+
| 5 | 123 |
| 6 | 345 |
| 7 | 111 |
+--------+------+
Oracle sql query is also welcomed. Thanks.
An Oracle solution (not sure if it meets your criteria of "one simple single query"):
WITH t AS
(SELECT item_id, row_number() OVER (ORDER BY some_criteria DESC) rn
FROM orders)
SELECT t2.rn, t2.item_id
FROM t t1 JOIN t t2 ON (t2.rn > t1.rn)
WHERE t1.item_id = 145;
My assumption is no repeating values of item_id.
Attempting to put this in MySQL terms, perhaps something like this might work:
SELECT t2.rank, t2.itemID
FROM (SELECT #rn:=#rn+1 AS rank, itemID
FROM (SELECT itemID
FROM orders
ORDER BY somecriteria DESC), (SELECT #rn:=0)) t1 INNER JOIN
(SELECT #rn:=#rn+1 AS rank, itemID
FROM (SELECT itemID
FROM orders
ORDER BY somecriteria DESC), (SELECT #rn:=0)) t2 ON t2.rank > t1.rank
WHERE t1.itemID = 145;
Disclaimer: I don't work with MySQL much, and it's untested. The Oracle piece works.
SELECT #rn:=#rn+1 AS rank, itemID
FROM (
SELECT itemID
FROM orders
ORDER BY somecriteria DESC
) t1, (SELECT #rn:=0) t2
where rank >
(
select rank from
(
SELECT #rn:=#rn+1 AS rank, itemID
FROM
(
SELECT itemID
FROM orders
ORDER BY somecriteria DESC
) t1, (SELECT #rn:=0) t2
) x where itemID = 145
) y

Top x rows and group by (again)

I know it's a frequent question but I just can't figure it out and the examples I found didn't helped. What I learned, the best strategy is to try to find the top and bottom values of the top range and then select the rest, but implementing is a bit tricky.
Example table:
id | title | group_id | votes
I'd like to get the top 3 voted rows from the table, for each group.
I'm expecting this result:
91 | hello1 | 1 | 10
28 | hello2 | 1 | 9
73 | hello3 | 1 | 8
84 | hello4 | 2 | 456
58 | hello5 | 2 | 11
56 | hello6 | 2 | 0
17 | hello7 | 3 | 50
78 | hello8 | 3 | 9
99 | hello9 | 3 | 1
I've fond complex queries and examples, but they didn't really helped.
You can do it using variables:
SELECT
id,
title,
group_id,
votes
FROM (
SELECT
id,
title,
group_id,
votes,
#rn := CASE WHEN #prev = group_id THEN #rn + 1 ELSE 1 END AS rn,
#prev := group_id
FROM table1, (SELECT #prev := -1, #rn := 0) AS vars
ORDER BY group_id DESC, votes DESC
) T1
WHERE rn <= 3
ORDER BY group_id, votes DESC
This is basically just the same as the following query in databases that support ROW_NUMBER:
SELECT
id,
title,
group_id,
votes
FROM (
SELECT
id,
title,
group_id,
votes,
ROW_NUMBER() OVER (PARTITION BY group_id ORDER BY votes DESC) AS rn
FROM student
) T1
WHERE rn <= 3
ORDER BY group_id, votes DESC
But since MySQL doesn't support ROW_NUMBER yet you have to simulate it, and that's what the variables are for. The two queries are otherwise identical. Try to understand the second query first, and hopefully the first should make more sense.