Top x rows and group by (again) - mysql

I know it's a frequent question but I just can't figure it out and the examples I found didn't helped. What I learned, the best strategy is to try to find the top and bottom values of the top range and then select the rest, but implementing is a bit tricky.
Example table:
id | title | group_id | votes
I'd like to get the top 3 voted rows from the table, for each group.
I'm expecting this result:
91 | hello1 | 1 | 10
28 | hello2 | 1 | 9
73 | hello3 | 1 | 8
84 | hello4 | 2 | 456
58 | hello5 | 2 | 11
56 | hello6 | 2 | 0
17 | hello7 | 3 | 50
78 | hello8 | 3 | 9
99 | hello9 | 3 | 1
I've fond complex queries and examples, but they didn't really helped.

You can do it using variables:
SELECT
id,
title,
group_id,
votes
FROM (
SELECT
id,
title,
group_id,
votes,
#rn := CASE WHEN #prev = group_id THEN #rn + 1 ELSE 1 END AS rn,
#prev := group_id
FROM table1, (SELECT #prev := -1, #rn := 0) AS vars
ORDER BY group_id DESC, votes DESC
) T1
WHERE rn <= 3
ORDER BY group_id, votes DESC
This is basically just the same as the following query in databases that support ROW_NUMBER:
SELECT
id,
title,
group_id,
votes
FROM (
SELECT
id,
title,
group_id,
votes,
ROW_NUMBER() OVER (PARTITION BY group_id ORDER BY votes DESC) AS rn
FROM student
) T1
WHERE rn <= 3
ORDER BY group_id, votes DESC
But since MySQL doesn't support ROW_NUMBER yet you have to simulate it, and that's what the variables are for. The two queries are otherwise identical. Try to understand the second query first, and hopefully the first should make more sense.

Related

Get top n counted rows in table within group [MySQL]

I'm trying to get just top 3 selling products grouped within categories (just top 3 products by occurrence in transactions (id) count(id) by each category). I was searching a lot for possible solution but with no result. It looks like it is a bit tricky in MySQL since one can't simply use top() function and so on. Sample data structure bellow:
+--------+------------+-----------+
| id |category_id | product_id|
+--------+------------+-----------+
| 1 | 10 | 32 |
| 2 | 10 | 34 |
| 3 | 10 | 32 |
| 4 | 10 | 21 |
| 5 | 10 | 100 |
| 6 | 7 | 101 |
| 7 | 7 | 39 |
| 8 | 7 | 41 |
| 9 | 7 | 39 |
+--------+------------+-----------+
In earlier versions of MySQL, I would recommend using variables:
select cp.*
from (select cp.*,
(#rn := if(#c = category_id, #rn + 1,
if(#c := category_id, 1, 1)
)
) as rn
from (select category_id, product_id, count(*) as cnt
from mytable
group by category_id, product_id
order by category_id, count(*) desc
) cp cross join
(select #c := -1, #rn := 0) params
) cp
where rn <= 3;
If you are running MySQL 8.0, you can use window function rank() for this:
select *
from (
select
category_id,
product_id,
count(*) cnt,
rank() over(partition by category_id order by count(*) desc) rn
from mytable
group by category_id, product_id
) t
where rn <= 3
In earlier versions, one option is to filter with a correlated subquery:
select
category_id,
product_id,
count(*) cnt
from mytable t
group by category_id, product_id
having count(*) >= (
select count(*)
from mytable t1
where t1.category_id = t.category_id and t1.product_id = t.product_id
order by count(*) desc
limit 3, 1
)

How to reference generated/aliased table in same query?

I want to find a user's position in a leaderboard and return the 4 users above and 4 users below their position.
My table, 'predictions', looks something like this:
+----+---------+--------+-------+---------+
| id | userId | score | rank | gameId |
+----+---------+--------+-------+---------+
| 1 | 12 | 11 | 1 | 18 |
| 2 | 1 | 6 | 4 | 18 |
| 3 | 43 | 7 | 3 | 12 |
| 4 | 4 | 9 | 2 | 18 |
| 5 | 98 | 2 | 5 | 19 |
| 6 | 3 | 0 | 6 | 18 |
+----+---------+--------+-------+---------+
Obviously this isn't properly ordered, so I run this:
SELECT l.userId,
l.rank,
l.score,
l.createdAt,
#curRow := #curRow + 1 AS row_number
FROM (SELECT * FROM `predictions` WHERE gameId = 18) l
JOIN (SELECT #curRow := 0) r
ORDER BY rank ASC
which gets me a nice table with each entry numbered.
I then want to search this generated table, find the row_number where userId = X, and then return the values 'around' that result.
I think I have the logic of the query down, I just can't work out how to reference the table 'generated' by the above query.
It would be something like this:
SELECT *
FROM (
SELECT l.userId,
l.rank,
l.score,
l.createdAt,
#curRow := #curRow + 1 AS row_number
FROM (SELECT * FROM `predictions` WHERE gameId = 18) l
JOIN (SELECT #curRow := 0) r
ORDER BY rank ASC) generated_ordered_table
WHERE row_number < (SELECT row_number FROM generated_ordered_table WHERE userId = 1)
ORDER BY row_number DESC
LIMIT 0,5
This fails. What I'm trying to do is to generate my first table with the correct query, give it an alias of generated_ordered_table, and then reference this 'table' later on in this query.
How do I do this?
MySQL version 8+ could have allowed the usage of Window functions, and Common Table Expressions (CTEs); which would have simplified the query quite a bit.
Now, in the older versions (your case), the "Generated Rank Table" (Derived Table) cannot be referenced again in a subquery inside the WHERE clause. One way would be to do the same thing twice (select clause to get generated table) again inside the subquery, but that would be relatively inefficient.
So, another approach can be to use Temporary Tables. We create a temp table first storing the ranks. And, then reference that temp table to get results accordingly:
CREATE TEMPORARY TABLE IF NOT EXISTS gen_rank_tbl AS
(SELECT l.userId,
l.rank,
l.score,
l.createdAt,
#curRow := #curRow + 1 AS row_number
FROM (SELECT * FROM `predictions` WHERE gameId = 18) l
JOIN (SELECT #curRow := 0) r
ORDER BY rank ASC)
Now, you can reference this temp table to get the desired results:
SELECT *
FROM gen_rank_tbl
WHERE row_number < (SELECT row_number FROM gen_rank_tbl WHERE userId = 1)
ORDER BY row_number DESC
LIMIT 0,5
You could use a bunch of unions
select userid,rank,'eq'
from t where gameid = 18 and userid = 1
union
(
select userid,rank,'lt'
from t
where gameid = 18 and rank < (select rank from t t1 where t1.userid = 1 and t1.gameid = t.gameid)
order by rank desc limit 4
)
union
(
select userid,rank,'gt'
from t
where gameid = 18 and rank > (select rank from t t1 where t1.userid = 1 and t1.gameid = t.gameid)
order by rank desc limit 4
);
+--------+------+----+
| userid | rank | eq |
+--------+------+----+
| 1 | 4 | eq |
| 4 | 2 | lt |
| 12 | 1 | lt |
| 3 | 6 | gt |
+--------+------+----+
4 rows in set (0.04 sec)
But it's not pretty
You can use two derived tables:
SELECT p.*,
(#user_curRow = CASE WHEN user_id = #x THEN rn END) as user_rn
FROM (SELECT p.*, #curRow := #curRow + 1 AS rn
FROM (SELECT p.*
FROM predictions p
WHERE p.gameId = 18
ORDER BY rank ASC
) p CROSS JOIN
(SELECT #curRow := 0, #user_curRow := -1) params
) p
HAVING rn BETWEEN #user_curRow - 4 AND #user_currow + 4;

Get user's highest score from a table

I have a feeling this is a very simple question but maybe i'm having brain fart right now and just can't seem to figure out how to go about it.
I have a MySQL table structure like below
+---------------------------------------------------+
| id | date | score | speed | user_id |
+---------------------------------------------------+
| 1 | 2016-11-17 | 2 | 133291 | 17 |
| 2 | 2016-11-17 | 6 | 82247 | 17 |
| 3 | 2016-11-17 | 6 | 21852 | 17 |
| 4 | 2016-11-17 | 1 | 109338 | 17 |
| 5 | 2016-11-17 | 7 | 64762 | 61 |
| 6 | 2016-11-17 | 8 | 49434 | 61 |
Now i can get a particular user's best performance by doing this
SELECT *
FROM performance
WHERE user_id = 17 AND date = '2016-11-17'
ORDER BY score desc,speed asc LIMIT 1
This should return the row with ID = 3. Now what I want is a single query to run to be able to return that 1 such row for each unique user_id in the table. So the resulting result would be something like this
+---------------------------------------------------+
| id | date | score | speed | user_id |
+---------------------------------------------------+
| 3 | 2016-11-17 | 6 | 21852 | 17 |
| 6 | 2016-11-17 | 8 | 49434 | 61 |
Also further more, can I have another question within this same query that would further sort this eventual resulting table by the same criteria of sort (score desc, speed asc). Thanks
A simple method uses a correlated subquery:
select p.*
from performance p
where p.date = '2016-11-17' and
p.id = (select p2.id
from performance p2
where p2.user_id = p.user_id and p2.date = p.date
order by score desc, speed asc
limit 1
);
This should be able to take advantage of an index on performance(date, user_id, score, speed).
Is easy using variable to emulate row_number() over (partition by Order by)
Explanation:
First create two variables in the subquery.
Order by user_id so when user change the #rn reset to 1
Order by score desc, speed asc so each row will have a row_number, and the one you want always will have rn = 1
#rn := you change #rn for each row
if you have a new user_id then #rn is set to 1
otherwise #rn is set to #rn+1
SQL Fiddle Demo
SELECT `id`, `date`, `score`, `speed`, `user_id`
FROM (
SELECT *,
#rn := if(#user_id = `user_id`,
#rn + 1 ,
if(#user_id := `user_id`,1,1)
) as rn
FROM Table1
CROSS JOIN (SELECT #user_id := 0, #rn := 0) as param
WHERE date = '2016-11-17'
ORDER BY `user_id`, `score` desc, `speed` asc
) T
where T.rn =1
OUTPUT
For mysql
You can try with a double in subselect and group by
select * from performance
where (user_id, score,speed ) in (
SELECT user_id, max_score, max(speed)
FROM performance
WHERE (user_id, score) in (select user_id, max(score) max_score
from performance
group by user_id)
group by user_id, max_score
);

MySQL top 2 records per group

Basically I need to get only the last 2 records for each user, considering the last created_datetime:
id | user_id | created_datetime
1 | 34 | '2015-09-10'
2 | 34 | '2015-10-11'
3 | 34 | '2015-05-23'
4 | 34 | '2015-09-13'
5 | 159 | '2015-10-01'
6 | 159 | '2015-10-02'
7 | 159 | '2015-10-03'
8 | 159 | '2015-10-06'
Returns (expected output):
2 | 34 | '2015-10-11'
1 | 34 | '2015-09-10'
7 | 159 | '2015-10-03'
8 | 159 | '2015-10-06'
I was trying with this idea:
select user_id, created_datetime,
$num := if($user_id = user_id, $num + 1, 1) as row_number,
$id := user_id as dummy
from logs group by user_id
having row_number <= 2
The idea is keep only these top 2 rows and remove all the others.
Any ideas?
Your idea is close. I think this will work better:
select u.*
from (select user_id, created_datetime,
$num := if(#user_id = user_id, #num + 1,
if(#user_id := id, 1, 1)
) as row_number
from logs cross join
(select #user_id := 0, #num := 0) params
order by user_id
) u
where row_number <= 2 ;
Here are the changes:
The variables are set in only one expression. MySQL does not guarantee the order of evaluation of expressions, so this is important.
The work is done in a subquery, which is then processed in the outer query.
The subquery uses order by, not group by.
The outer query uses where instead of having (actually, in MySQL having would work, but where is more appropriate).

SQL Find Position in table

I have a table in mySql which has the users ID and scores.
What I would like to do is organise the table by scores (simple) but then find where a certain user ID sits in the table.
So far I would have:
SELECT * FROM table_score
ORDER BY Score DESC
How would I find where userID = '1234' is (i.e entry 10 of 12)
The following query will give you a new column UserRank, which specify the user rank:
SELECT
UserID,
Score,
(#rownum := #rownum + 1) UserRank
FROM table_score, (SELECT #rownum := 0) t
ORDER BY Score DESC;
SQL Fiddle Demo
This will give you something like:
| USERID | SCORE | USERRANK |
-----------------------------
| 4 | 100 | 1 |
| 10 | 70 | 2 |
| 2 | 55 | 3 |
| 1234 | 50 | 4 |
| 1 | 36 | 5 |
| 20 | 33 | 6 |
| 8 | 25 | 7 |
Then you can put this query inside a subquery and filter with a userId to get that user rank. Something like:
SELECT
t.UserRank
FROM
(
SELECT *, (#rownum := #rownum + 1) UserRank
FROM table_score, (SELECT #rownum := 0) t
ORDER BY Score DESC
) t
WHERE userID = '1234';
SQL Fiddle Demo
For a given user id, you can do this with a simple query:
select sum(case when ts.score >= thescore.score then 1 else 0 end) as NumAbove,
count(*) as Total
from table_score ts cross join
(select ts.score from table_score ts where userId = '1234') thescore
If you have indexes on score and userid, this will be quite efficient.