sql where score >= s.score - mysql

I have a question about sql. I have a question looks like this.
+----+-------+
| Id | Score |
+----+-------+
| 1 | 3.50 |
| 2 | 3.65 |
| 3 | 4.00 |
| 4 | 3.85 |
| 5 | 4.00 |
| 6 | 3.65 |
+----+-------+
The table is called 'Scores' and after ranking the score here, it will look like this,
+-------+------+
| Score | Rank |
+-------+------+
| 4.00 | 1 |
| 4.00 | 1 |
| 3.85 | 2 |
| 3.65 | 3 |
| 3.65 | 3 |
| 3.50 | 4 |
+-------+------+
Here is a sample answer but I am confused about the part after WHERE.
select
s.Score,
(select count(distinct Score) from Scores where Score >= s.Score)
Rank
from Scores s
order by s.Score Desc;
This Score >= s.Score is something like Score column compare with itself. I totally feel confused about this part. How does it work? Thank you!
E.

One way to understand this is to just run the query for each row of your sample data. Starting with the first row, we see that the score is 4.00. The correlated subquery in the select clause:
(select count(distinct Score) from Scores where Score >= s.Score)
will return a count of 1, because there is only one record whose distinct score is greater than or equal to 4.00. This is also the case for the second record in your data, which has a score of 4.00 as well. For the score 3.85, the subquery would find a distinct count of 2, because there are two scores which are greater than or equal to 3.85, namely 3.85 and 4.00. You can apply this logic across the whole table to convince yourself of how the query works.
+-------+------+
| Score | Rank |
+-------+------+
| 4.00 | 1 | <-- 1 score >= 4.00
| 4.00 | 1 | <-- 1 score >= 4.00
| 3.85 | 2 | <-- 2 scores >= 3.85
| 3.65 | 3 | <-- 3 scores >= 3.65
| 3.65 | 3 | <-- 3 scores >= 3.65
| 3.50 | 4 | <-- 4 scores >= 3.50
+-------+------+

This is known as a dependent subquery (and can be quite inefficient). A dependent subquery - basically means it cannot be turned into a join because it "depends" on a specific value - runs for every result row in the output for the specific "dependent" values. In this case each result-row already has a "specific" value of s.Score.
The 'Score' in the dependent subquery refers to the original table and not the outer query.
It may be more clear with an additional alias:
select
s.Score,
(select count(distinct other_scores.Score)
from Scores other_scores
where other_scores.Score >= s.Score) Rank -- value of s.Score is known
-- and placed directly into dependent subquery
from Scores s
order by s.Score Desc;
"Modern" SQL dialects (including MySQL 8.0+) provide "RANK" and "DENSE_RANK" Window Functions to answer these sorts of queries. Window Functions, where applicable, are often much faster than dependent queries because the Query Planner can optimize at a higher level: these functions also have a tendency to tame otherwise gnarly SQL.
The MySQL 8+ SQL Syntax that ought to do the trick:
select
s.Score,
DENSE_RANK() over w AS Rank
from Scores s
window w as (order by Score desc)
There are also various work-abouts to emulate ROW_NUMBER / Window Functions for older versions of MySQL.

Because it is dependent subquery. Every subquery will need to be re-evaluate on each row from outter query. If you familiar with Python, you can think of it like this:
from collections import namedtuple
ScoreTuple = namedtuple('ScoreTuple', ['Id', 'Score'])
Scores = [ScoreTuple(1, 3.50),
ScoreTuple(2, 3.65),
ScoreTuple(3, 4.00),
ScoreTuple(4, 3.85),
ScoreTuple(5, 4.00),
ScoreTuple(6, 3.65)]
Rank = []
for s in Scores: # each row from outter query
rank = len(set([innerScore.Score # SELECT COUNT(DISTINCT Score)
for innerScore in Scores # FROM Scores
if innerScore.Score >= s.Score])) # WHERE Score >= s.Score
Rank.append(rank)

Related

MySQL: GROUP BY + HAVING MAX(...) --- Why HAVING MAX(grade) will not return maximum grade?

Credit Leetcode 1112. Highest Grade For Each Student
Requirement: Write a SQL query to find the highest grade with its corresponding course for each student. In case of a tie, you should find the course with the smallest course_id. The output must be sorted by increasing student_id.
The query result format is in the following example:
Enrollments table:
+------------+-------------------+
| student_id | course_id | grade |
+------------+-----------+-------+
| 2 | 2 | 95 |
| 2 | 3 | 95 |
| 1 | 1 | 90 |
| 1 | 2 | 99 |
| 3 | 1 | 80 |
| 3 | 2 | 75 |
| 3 | 3 | 82 |
+------------+-----------+-------+
Result table:
+------------+-------------------+
| student_id | course_id | grade |
+------------+-----------+-------+
| 1 | 2 | 99 |
| 2 | 2 | 95 |
| 3 | 3 | 82 |
Why this cannot work?
select student_id, course_id, grade
from enrollments
group by student_id
having max(grade)
order by student_id
I thought the return should be {"headers": ["student_id", "course_id", "grade"], "values": [[1, 2, 99], [2, 2, 95], [3, 3, 82]]}; however, the actual return is {"headers": ["student_id", "course_id", "grade"], "values": [[1, 1, 90], [2, 2, 95], [3, 1, 80]]}.
Thank you so much if anyone can help me!
Maybe you think that this condition:
having max(grade)
is an instruction so that only the rows that have the max grade for each studentid should be returned.
This is not what a HAVING clause does.
It is a way to filter aggregated data after aggregation is done when it is used after a GROUP BY clause.
The HAVING clause accepts 1 or more boolean expressions which evaluate to TRUE or FALSE.
So in this case max(grade) is not a boolean expression (although for MySql any numeric expression can be used in the place of a boolean one).
I understand that you want in the results the rows with the max grade of each studentid.
This can be done in the most efficient and performant way with window functions in MySQL 8.0: ROW_NUMBER() or RANK() if you want ties returned also:
select e.student_id, e.course_id, e.grade
from (
select *, row_number() over (partition by student_id order by grade desc) rn
from Enrollments
) e
where e.rn = 1
See the demo.
Results:
| student_id | course_id | grade |
| ---------- | --------- | ----- |
| 1 | 2 | 99 |
| 2 | 2 | 95 |
| 3 | 3 | 82 |
This is a typical top-1-per-group problem. The key to solve this is that, since you want entire records, you should not think aggregation, but filtering.
I would recommend a correlated subquery for this. This is a portable solution, that works across most databases (including MySQL 5.x versions, that do not support window functions). With the right index in place, it is usually a very efficient approach.
select e.*
from enrollments e
where e.grade = (
select max(e1.grade)
from enrollments e1
where e1.student_id = e.student_id
)
The index you want here is (student_id, grade).
Looks like you need a subquery in the FROM clause to handle the double GROUP BY.
In the below query, the subquery gets the max grade per user, and then the outer enrollments table joins on both student_id and grade. It then takes the first course_id in the outer query.
SELECT e.student_id, min(e.course_id) course_id, e.grade
FROM enrollments e
JOIN (
SELECT student_id, max(grade) grade
FROM enrollments
GROUP BY student_id) g USING (student_id, grade)
GROUP BY e.studentId;

SQL syntax error ranking scores in LeetCode

I'd like to solve this LeetCode problem, https://leetcode.com/problems/rank-scores/, using MySQL following this example in the docs (https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html#function_rank):
SELECT score, RANK() OVER w as 'rank'
FROM scores
WINDOW w AS (ORDER BY score DESC);
I've created a test database (using the Django ORM) in which this works fine:
mysql> SELECT score, RANK() OVER w as 'rank' FROM scores WINDOW w AS (ORDER BY score DESC);
+-------+------+
| score | rank |
+-------+------+
| 4.00 | 1 |
| 4.00 | 1 |
| 3.85 | 3 |
| 3.65 | 4 |
| 3.65 | 4 |
| 3.50 | 6 |
+-------+------+
6 rows in set (0.00 sec)
However, if I enter this in LeetCode I get a syntax error:
Any idea what the problem is here? Perhaps RANK() is a new function which the MySQL version on LeetCode doesn't have yet?
Rank Is not supported in MySql 5.7.21 . Only from Mysql 8 , we can use rank function, you can try the below query
SELECT Score,
(SELECT count(1) FROM (SELECT distinct Score s FROM Scores) tmp WHERE s >= Score) 'rank'
FROM Scores
ORDER BY Score desc

MySQL Select from Multiple Tables and most recent record

I'm having issues with a select query and can't quite figure out how to fix. I have two tables:
TABLE_students
|--------|------------|--------|
| STU_ID | EMAIL | NAME |
|--------|------------|--------|
| 1 | a#e.com | Bob |
| 2 | b#e.com | Joe |
| 3 | c#e.com | Tim |
--------------------------------
TABLE_scores
|--------|------------|-------------|--------|
| SRE_ID | STU_ID | DATE | SCORE |
|--------|------------|-------------|--------|
| 91 | 2 | 2018-04-03 | 78 |
| 92 | 2 | 2018-04-06 | 89 |
| 93 | 3 | 2018-04-03 | 67 |
| 94 | 3 | 2018-04-06 | 72 |
| 95 | 3 | 2018-04-07 | 81 |
----------------------------------------------
I'm trying to select data from both tables but have a few requirements. I need to select the student even if they don't have a score in the scores table. I also only only want the latest scores record.
The query below only returns those students that have a score and it also duplicates returns a total of 5 rows (since there are five scores). What I want is for the query to return three rows (one for each student) and their latest score value (or NULL if they don't have a score):
SELECT students.NAME, scores.SCORE FROM TABLE_students as students, TABLE_scores AS scores WHERE students.STU_ID = scores.STU_ID;
I'm having difficulty figuring out how to pull all students regardless of whether they have a score and how to pull only the latest score if they do have one.
Thank you!
This is a variation of the greatest-n-per-group question, which is common on Stack Overflow.
I would do this with a couple of joins:
SELECT s.NAME, c1.DATE, c1.SCORE
FROM students AS s
LEFT JOIN scores AS c1 ON c1.STU_ID = s.STU_ID
LEFT JOIN scores AS c2 ON c2.STU_ID = s.STU_ID
AND (c2.DATE > c1.DATE OR c2.DATE = c1.DATE AND c2.SRE_ID > c1.SRE_ID)
WHERE c2.STU_ID IS NULL;
If c2.STU_ID is null, it means the LEFT JOIN matched no rows that have a greater date (or greater SRE_ID in case of a tie) than the row in c1. This means the row in c1 must be the most recent, because there is no other row that is more recent.
P.S.: Please learn the JOIN syntax, and avoid "comma-style" joins. JOIN has been standard since 1992.
P.P.S.: I removed the superfluous "TABLE_" prefix from your table names. You don't need to use the table name to remind yourself that it's a table! :-)
You could use correlated subquery:
SELECT *,
(SELECT score FROM TABLE_scores sc
WHERE sc.stu_id = s.stu_id ORDER BY DATE DESC LIMIT 1) AS score
FROM TABLE_students s

Fetch first N rows including tie values MYSQL

+-----+-------+-----+
| id | Name |Votes|
+-----+-------+-----+
| 1 | Joe | 36 |
| 2 | John | 34 |
| 3 | Mark | 42 |
| 4 | Ryan | 29 |
| 5 | Jay | 36 |
| 6 | Shawn | 39 |
+-----+-------+-----+
For this example, what I want is to retrieve the rows with the first 3 highest votes. However, if you'll notice, there are two rows with the same vote count. So this should be the result:
+-----+-------+-----+
| id | Name |Votes|
+-----+-------+-----+
| 3 | Mark | 42 |
| 6 | Shawn | 39 |
| 1 | Joe | 36 |
| 5 | Jay | 36 |
+-----+-------+-----+
How to achieve this?
You will have to perform an INNER JOIN, using the table back on itself. First, you want to select the top 3 unique/distinct scores, and this can be done by using:
SELECT DISTINCT Votes FROM mytable ORDER BY Votes DESC LIMIT 3
Now that you have obtained the top 3 scores, you want to join it back to the original table:
SELECT t1.* FROM mytable AS t1
INNER JOIN
(SELECT DISTINCT Votes FROM mytable ORDER BY Votes DESC LIMIT 3) AS topvotes
ON
topvotes.Votes = t1.Votes
ORDER BY t1.Votes DESC
Refer to a simple diagram for the strategy:
For this query to be efficient, you will want to index the Votes column so that the subquery can fish out distinct votes quickly ;)
Here is a proof-of-concept SQLfiddle: http://sqlfiddle.com/#!9/c78f0/10
Probably not the most efficient, but I think this should work:
SELECT * FROM scores WHERE score IN(SELECT score FROM scores ORDER BY score DESC LIMIT 3)
Although this can yield an error about limit not being supported in subqueries.
A workaround;
SELECT * FROM scores WHERE score IN(SELECT * FROM (SELECT score FROM scores ORDER BY score DESC LIMIT 3) AS t)

SQL query performance improvement for advice

Post the problem statement and current code I am using, and wondering if any smart ideas to improve query performance? Using MySQL. Thanks.
Write a SQL query to rank scores. If there is a tie between two scores, both should have the same ranking. Note that after a tie, the next ranking number should be the next consecutive integer value. In other words, there should be no "holes" between ranks.
+----+-------+
| Id | Score |
+----+-------+
| 1 | 3.50 |
| 2 | 3.65 |
| 3 | 4.00 |
| 4 | 3.85 |
| 5 | 4.00 |
| 6 | 3.65 |
+----+-------+
For example, given the above Scores table, your query should generate the following report (order by highest score):
+-------+------+
| Score | Rank |
+-------+------+
| 4.00 | 1 |
| 4.00 | 1 |
| 3.85 | 2 |
| 3.65 | 3 |
| 3.65 | 3 |
| 3.50 | 4 |
+-------+------+
SELECT
s.score, scores_and_ranks.rank
FROM
Scores s
JOIN
(
SELECT
score_primary.score, COUNT(DISTINCT score_higher.score) + 1 AS rank
FROM
Scores score_primary
LEFT JOIN Scores score_higher
ON score_higher.score > score_primary.score
GROUP BY score_primary.score
) scores_and_ranks
ON s.score = scores_and_ranks.score
ORDER BY rank ASC;
BTW, post issue from Gordon's code.
BTW, tried sgeddes's code, but met with new issues,
New issue from Gordon's code,
thanks in advance,
Lin
User defined variables are probably faster than what you are doing. However, you need to be careful when using them. In particular, you cannot assign a variable in one expression and use it in another -- I mean, you can, but the expressions can be evaluated in any order so your code may not do what you intend.
So, you need to do all the work in a single expression:
select s.*,
(#rn := if(#s = score, #rn,
if(#s := score, #rn + 1, #rn + 1)
)
) as rank
from scores s cross join
(select #rn := 0, #s := 0) params
order by score desc;
One option is to use user-defined variables:
select score,
#rnk:=if(#prevScore=score,#rnk,#rnk+1) rnk,
#prevScore:=score
from scores
join (select #rnk:=0, #prevScore:=0) t
order by score desc
SQL Fiddle Demo