SQL query performance improvement for advice - mysql

Post the problem statement and current code I am using, and wondering if any smart ideas to improve query performance? Using MySQL. Thanks.
Write a SQL query to rank scores. If there is a tie between two scores, both should have the same ranking. Note that after a tie, the next ranking number should be the next consecutive integer value. In other words, there should be no "holes" between ranks.
+----+-------+
| Id | Score |
+----+-------+
| 1 | 3.50 |
| 2 | 3.65 |
| 3 | 4.00 |
| 4 | 3.85 |
| 5 | 4.00 |
| 6 | 3.65 |
+----+-------+
For example, given the above Scores table, your query should generate the following report (order by highest score):
+-------+------+
| Score | Rank |
+-------+------+
| 4.00 | 1 |
| 4.00 | 1 |
| 3.85 | 2 |
| 3.65 | 3 |
| 3.65 | 3 |
| 3.50 | 4 |
+-------+------+
SELECT
s.score, scores_and_ranks.rank
FROM
Scores s
JOIN
(
SELECT
score_primary.score, COUNT(DISTINCT score_higher.score) + 1 AS rank
FROM
Scores score_primary
LEFT JOIN Scores score_higher
ON score_higher.score > score_primary.score
GROUP BY score_primary.score
) scores_and_ranks
ON s.score = scores_and_ranks.score
ORDER BY rank ASC;
BTW, post issue from Gordon's code.
BTW, tried sgeddes's code, but met with new issues,
New issue from Gordon's code,
thanks in advance,
Lin

User defined variables are probably faster than what you are doing. However, you need to be careful when using them. In particular, you cannot assign a variable in one expression and use it in another -- I mean, you can, but the expressions can be evaluated in any order so your code may not do what you intend.
So, you need to do all the work in a single expression:
select s.*,
(#rn := if(#s = score, #rn,
if(#s := score, #rn + 1, #rn + 1)
)
) as rank
from scores s cross join
(select #rn := 0, #s := 0) params
order by score desc;

One option is to use user-defined variables:
select score,
#rnk:=if(#prevScore=score,#rnk,#rnk+1) rnk,
#prevScore:=score
from scores
join (select #rnk:=0, #prevScore:=0) t
order by score desc
SQL Fiddle Demo

Related

Score table with rank variable order and keep rankings

Lets say we have a score table from a sport competition:
-----------------------------------------
nickname | challenge | score | rank
-----------------------------------------
Sporty | 3 | 37283 | 1
Performer | 2 | 32319 | 2
John | 5 | 21021 | 3
Sandra | 3 | 12320 | 4
The query I use:
SELECT nickname,
challenge,
score,
#rank := #rank + 1 AS rank FROM rankings,
(SELECT #rank := 0) r
ORDER BY rank desc
I want to reorder all columns but keep the rankinks by score. For example
the table should be ordered by nickname like this:
-----------------------------------------
nickname | challenge | score | rank
-----------------------------------------
John | 5 | 21021 | 3
Performer | 2 | 32319 | 2
Sandra | 3 | 12320 | 4
sporty | 3 | 37283 | 1
I'm using MySQL 5.7, so I cannot use the rankings-functionality in MySQL 8.
How can I achive this?
Use a subquery:
SELECT nickname, challenge, score, rnk
FROM
(
SELECT nickname, challenge, score,
#rank := #rank + 1 AS rnk
FROM rankings, (SELECT #rank := 0) r
ORDER BY rnk DESC
) t
ORDER BY nickname;
Demo
The idea here is to first materialize the ranking column inside the subquery. Then, we can order that by some other column on the outside. Note that I avoid using the alias rank, because starting in MySQL, RANK is the name of an analytic function.

Setting rank based on query

I've a table items listing different elements with id, popularity and rank columns.
popularity column contains an int allowing to sort elements by popularity.
I've made a query to sort by popularity and set a rank for each entry:
SELECT id,
#curRank := #curRank + 1 AS rank
FROM items, (SELECT #curRank := 0) r
ORDER BY popularity DESC
This query works perfectly and give me a result with id and rank where the rank value is as expected and respect order by popularity.
What I'm trying to achieve is to set rank value for each entry, and I tried it this way:
UPDATE items A
JOIN (
SELECT id,
#curRank := #curRank + 1 AS rank
FROM items,
(SELECT #curRank := 0) r
ORDER BY popularity DESC
) AS ranks
SET A.rank = ranks.rank
WHERE A.id = ranks.id
A rank value is set for each row but doesn't respect the ORDER BY popularity DESC. Instead rank value seems to be set by an id order (id 1 has rank 1, id 2 has rank 2 etc...).
What am I doing wrong?
Regards,
I think you're making this harder than it should be.
SET #curRank = 0;
UPDATE items
SET rank = (#curRank := #curRank+1)
ORDER BY popularity DESC;
I just set the #curRank variable in a SET statement before the UPDATE. When you try to combine them, it just makes readers of your code wonder what it means.
You don't need to make them part of the same statement. The session variable will keep its value as long as you execute both statements in the same database session.
There's no need for subqueries or joins. Just use UPDATE ... ORDER BY (although UPDATE with ORDER BY doesn't work in MySQL if you do need to do a JOIN).
MySQL has suprising behaviors when dealing with variables and ordering.
One thing that you could try is order earlier, by moving the ORDER BY on items to a a subquery, as follows:
UPDATE items A
JOIN (
SELECT id,
#curRank := #curRank + 1 AS rank
FROM
(SELECT id FROM items ORDER BY popularity DESC) items,
(SELECT #curRank := 0) r
) AS ranks
SET A.rank = ranks.rank
WHERE A.id = ranks.id
Demo on DB Fiddle:
Data:
| id | popularity | rank |
| --- | ---------- | ---- |
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
| 4 | 4 | 4 |
| 5 | 5 | 5 |
After update:
| id | popularity | rank |
| --- | ---------- | ---- |
| 1 | 1 | 5 |
| 2 | 2 | 4 |
| 3 | 3 | 3 |
| 4 | 4 | 2 |
| 5 | 5 | 1 |

MySql Identify the row in each group.

Set of data that needs to be sorted with column 1 showing on the first record but not remaining records. I could add another column with boolean to determine which is first record. The desired result below.
+--------+------------+-------+
| type | variety | price |
+--------+------------+-------+
| apple | gala | 2.79 |
| | fuji | 0.24 |
| | limbertwig | 2.87 |
| orange | valencia | 3.59 |
| | navel | 9.36 |
| pear | bradford | 6.05 |
| | bartlett | 2.14 |
| cherry | bing | 2.55 |
| | chelan | 6.33 |
+--------+------------+-------+
SELECT `type`, `variety`, `price`
FROM (
SELECT IF(#prev != t.`type`, t.`type`, '') AS `type`
, t.`variety`, t.`price`
, #prev := t.`type` AS actualType
FROM theTable AS t
CROSS JOIN (SELECT #prev := '') AS init
ORDER BY t.`type`, t.`variety`, t.`price`
) AS subQ
It's been a while since I did something like this, but this is the general idea.
The init subquery is just used to initialize the #prev session variable.
The IF uses the "last seen" type to determine whether to show the type.
The expression aliased as actualType updates the #prev session variable for the next row processed.
The ORDER BY is needed to order the rows so that the #prev works out appropriately; in some cases I've had to put the ORDER BY in a deeper subquery (SELECT ... FROM theTable ORDER BY ...) AS t to make sure it is not applied after the expressions involving #prev.
As others have mentioned, this is best done within the client as this is an issue of presentation, but you can technically achieve what you are looking for using the ROW_NUMBER windowed function and a CASE statement. I don't have a MYSQL instance handy, but the following should work.
WITH T as (
SELECT
ROW_NUMBER() OVER ( PARTITION BY type ORDER BY 1 ) rownum,
type,
variety,
price
FROM products )
SELECT
CASE WHEN rownum = 1 THEN type ELSE '' END type,
variety,
price
FROM t;

sql where score >= s.score

I have a question about sql. I have a question looks like this.
+----+-------+
| Id | Score |
+----+-------+
| 1 | 3.50 |
| 2 | 3.65 |
| 3 | 4.00 |
| 4 | 3.85 |
| 5 | 4.00 |
| 6 | 3.65 |
+----+-------+
The table is called 'Scores' and after ranking the score here, it will look like this,
+-------+------+
| Score | Rank |
+-------+------+
| 4.00 | 1 |
| 4.00 | 1 |
| 3.85 | 2 |
| 3.65 | 3 |
| 3.65 | 3 |
| 3.50 | 4 |
+-------+------+
Here is a sample answer but I am confused about the part after WHERE.
select
s.Score,
(select count(distinct Score) from Scores where Score >= s.Score)
Rank
from Scores s
order by s.Score Desc;
This Score >= s.Score is something like Score column compare with itself. I totally feel confused about this part. How does it work? Thank you!
E.
One way to understand this is to just run the query for each row of your sample data. Starting with the first row, we see that the score is 4.00. The correlated subquery in the select clause:
(select count(distinct Score) from Scores where Score >= s.Score)
will return a count of 1, because there is only one record whose distinct score is greater than or equal to 4.00. This is also the case for the second record in your data, which has a score of 4.00 as well. For the score 3.85, the subquery would find a distinct count of 2, because there are two scores which are greater than or equal to 3.85, namely 3.85 and 4.00. You can apply this logic across the whole table to convince yourself of how the query works.
+-------+------+
| Score | Rank |
+-------+------+
| 4.00 | 1 | <-- 1 score >= 4.00
| 4.00 | 1 | <-- 1 score >= 4.00
| 3.85 | 2 | <-- 2 scores >= 3.85
| 3.65 | 3 | <-- 3 scores >= 3.65
| 3.65 | 3 | <-- 3 scores >= 3.65
| 3.50 | 4 | <-- 4 scores >= 3.50
+-------+------+
This is known as a dependent subquery (and can be quite inefficient). A dependent subquery - basically means it cannot be turned into a join because it "depends" on a specific value - runs for every result row in the output for the specific "dependent" values. In this case each result-row already has a "specific" value of s.Score.
The 'Score' in the dependent subquery refers to the original table and not the outer query.
It may be more clear with an additional alias:
select
s.Score,
(select count(distinct other_scores.Score)
from Scores other_scores
where other_scores.Score >= s.Score) Rank -- value of s.Score is known
-- and placed directly into dependent subquery
from Scores s
order by s.Score Desc;
"Modern" SQL dialects (including MySQL 8.0+) provide "RANK" and "DENSE_RANK" Window Functions to answer these sorts of queries. Window Functions, where applicable, are often much faster than dependent queries because the Query Planner can optimize at a higher level: these functions also have a tendency to tame otherwise gnarly SQL.
The MySQL 8+ SQL Syntax that ought to do the trick:
select
s.Score,
DENSE_RANK() over w AS Rank
from Scores s
window w as (order by Score desc)
There are also various work-abouts to emulate ROW_NUMBER / Window Functions for older versions of MySQL.
Because it is dependent subquery. Every subquery will need to be re-evaluate on each row from outter query. If you familiar with Python, you can think of it like this:
from collections import namedtuple
ScoreTuple = namedtuple('ScoreTuple', ['Id', 'Score'])
Scores = [ScoreTuple(1, 3.50),
ScoreTuple(2, 3.65),
ScoreTuple(3, 4.00),
ScoreTuple(4, 3.85),
ScoreTuple(5, 4.00),
ScoreTuple(6, 3.65)]
Rank = []
for s in Scores: # each row from outter query
rank = len(set([innerScore.Score # SELECT COUNT(DISTINCT Score)
for innerScore in Scores # FROM Scores
if innerScore.Score >= s.Score])) # WHERE Score >= s.Score
Rank.append(rank)

calculating ranking in SQL

I have a table which has float score, and I want to rank them from largest to smallest, if the same score, same ranking. I am using MySQL/MySQL Workbench, and any good ideas are appreciated.
Here is a sample input and output,
+----+-------+
| Id | Score |
+----+-------+
| 1 | 3.50 |
| 2 | 3.65 |
| 3 | 4.00 |
| 4 | 3.85 |
| 5 | 4.00 |
| 6 | 3.65 |
+----+-------+
+-------+------+
| Score | Rank |
+-------+------+
| 4.00 | 1 |
| 4.00 | 1 |
| 3.85 | 2 |
| 3.65 | 3 |
| 3.65 | 3 |
| 3.50 | 4 |
+-------+------+
Tried the following query, but not working since it does not handle duplicate,
SELECT id, score,
#curRank := #curRank + 1 AS rank
FROM TestRank tr, (SELECT #curRank := 0) r
ORDER BY score desc;
In this above query, user 3 and user 5 have the same score value 4, but ranked differently.
I also tried the following query to just rank score itself, and it returns very weird results,
set #curRank := 0;
SELECT distinct score, #curRank := #curRank+1 as rank
FROM TestRank tr
ORDER BY score desc;
thanks in advance,
Lin
Check out this fiddle : http://sqlfiddle.com/#!9/17a49/3
Here's the query that will work for you:
SELECT
s.score, scores_and_ranks.rank
FROM
scores s
JOIN
(
SELECT
score_primary.score, COUNT(DISTINCT score_higher.score) + 1 AS rank
FROM
scores score_primary
LEFT JOIN scores score_higher ON score_higher.score > score_primary.score
GROUP BY score_primary.score
) scores_and_ranks
ON s.score = scores_and_ranks.score
ORDER BY rank ASC
In the "scores_and_ranks" inner query, we total up the number of distinct scores that are better than the current score. The top score will have zero, so we add 1 to get the rank value you want.
The reason we have to join to that table (using table "s") is to make sure the duplicate score values (two rows with score=4, for example) are shown in distinct rows.
You can do this by "remembering" the previous score:
SELECT id, score,
(#curRank := if(#s = score, #curRank + 1,
if(#s := score, 1, 1)
)
) as rank
FROM TestRank tr CROSS JOIN
(SELECT #curRank := 0, #s := -1) r
ORDER BY score desc;