Selecting best row in each group based on two columns [duplicate] - mysql

This question already has answers here:
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 4 years ago.
Suppose we have the following table, where each row represents a submission a user made during a programming contest, id is an auto-increment primary key, probid identifies the problem the submission was made to, score is the number of points the submission earned for the problem, and date is the timestamp when the submission was made. Each user can submit as many times as they want to the same problem:
+----+----------+--------+-------+------------+
| id | username | probid | score | date |
+----+----------+--------+-------+------------+
| 1 | brian | 1 | 5 | 1542766686 |
| 2 | alex | 1 | 10 | 1542766686 |
| 3 | alex | 2 | 5 | 1542766901 |
| 4 | brian | 1 | 10 | 1542766944 |
| 5 | jacob | 2 | 10 | 1542766983 |
| 6 | jacob | 1 | 10 | 1542767053 |
| 7 | brian | 2 | 8 | 1542767271 |
| 8 | jacob | 2 | 10 | 1542767456 |
| 9 | brian | 2 | 7 | 1542767522 |
+----+----------+--------+-------+------------+
In order to rank the contestants, we need to determine the best submission each user made to each problem. The "best" submission is the one with the highest score, with ties broken by submission ID (i.e., if the user got the same score on the same problem twice, we only care about the earlier of the two submissions). This would yield a table like the following:
+----------+--------+----+-------+------------+
| username | probid | id | score | date |
+----------+--------+----+-------+------------+
| alex | 1 | 2 | 10 | 1542766686 |
| alex | 2 | 3 | 5 | 1542766901 |
| brian | 1 | 4 | 10 | 1542766944 |
| brian | 2 | 7 | 8 | 1542767271 |
| jacob | 1 | 6 | 10 | 1542767053 |
| jacob | 2 | 5 | 10 | 1542766983 |
+----------+--------+----+-------+------------+
How can I write a query to accomplish this?

SELECT username , probid , id , score , `date`
FROM tableName
ORDER BY username, score DESC, ID

Using MySQL-8.0 or MariaDB-10.2 or later:
SELECT username, probid, id, score, `date`
FROM (
SELECT username, probid, id, score, `date`,
ROW_NUMBER() over (
PARTITION BY username,probid
ORDER BY score DESC) as `rank`
FROM tablename
) as tmp
WHERE tmp.`rank` = 1

This query will work on versions of MySQL prior to 8.0 as well. The LEFT JOIN removes duplicate scores, ensuring that equal scores only have the lowest date in the result set for a given score. Then the WHERE clause ensures that we have the maximum score for a given user/problem combination:
SELECT t1.username, t1.probid, t1.id, t1.score, t1.date
FROM tablename t1
LEFT JOIN tablename t2
ON t2.username = t1.username AND
t2.probid = t1.probid AND
t2.score = t1.score AND
t2.date < t1.date
WHERE t2.id IS NULL AND
t1.score = (SELECT MAX(score) FROM tablename t3 WHERE t3.username = t1.username AND t3.probid = t1.probid)
ORDER BY t1.username, t1.probid
Update
It's almost certainly more efficient to JOIN the table to a list of maximum scores per user per problem first rather than computing the MAX value for each row in the result table. This query does that instead:
SELECT t1.username, t1.probid, t1.id, t1.score, t1.date
FROM tablename t1
JOIN (SELECT username, probid, MAX(score) AS score
FROM tablename
GROUP BY username, probid) t2
ON t2.username = t1.username AND
t2.probid = t1.probid AND
t2.score = t1.score
LEFT JOIN tablename t3
ON t3.username = t1.username AND
t3.probid = t1.probid AND
t3.score = t1.score AND
t3.date < t1.date
WHERE t3.id IS NULL
ORDER BY t1.username, t1.probid
Output (for both queries):
username probid id score date
alex 1 2 10 1542766686
alex 2 3 5 1542766901
brian 1 4 10 1542766944
brian 2 7 8 1542767271
jacob 1 6 10 1542767053
jacob 2 5 10 1542766983
Updated Demo on SQLFiddle

In pre-MySQL 8.0.2, we can emulate Row_Number() functionality using User-defined Variables. In this technique, we firstly get the data in a particular order (depends on the problem statement at hand).
In your case, within a partition of probid and username, we need to rank scores in descending order, with the row having lower timestamp value given higher priority (to break the ties). So, we will ORDER BY probid, username, score DESC, date ASC.
Now, we can use this result-set as a Derived Table, and determine the row number. It will be like a Looping technique (which we use in application code, eg: PHP). We would store the previous row values in the User-defined variables, and use conditional CASE .. WHEN expressions to check the current row's value(s) against the previous row. And, then assign row number accordingly.
Eventually, we will consider only those rows where row number is 1, and (if required), sort it by username and probid.
Query
SELECT dt2.username,
dt2.probid,
dt2.id,
dt2.score,
dt2.date
FROM (SELECT #rn := CASE
WHEN #un = dt1.username
AND #pid = dt1.probid THEN #rn + 1
ELSE 1
end AS row_no,
#un := dt1.username AS username,
#pid := dt1.probid AS probid,
dt1.id,
dt1.score,
dt1.date
FROM (SELECT id,
username,
probid,
score,
date
FROM your_table
ORDER BY username,
probid,
score DESC,
date ASC) AS dt1
CROSS JOIN (SELECT #un := '',
#pid := 0,
#rn := 0) AS user_init_vars) AS dt2
WHERE dt2.row_no = 1
ORDER BY dt2.username, dt2.probid;
Result
| username | probid | id | score | date |
| -------- | ------ | --- | ----- | ---------- |
| alex | 1 | 2 | 10 | 1542766686 |
| alex | 2 | 3 | 5 | 1542766901 |
| brian | 1 | 4 | 10 | 1542766944 |
| brian | 2 | 7 | 8 | 1542767271 |
| jacob | 1 | 6 | 10 | 1542767053 |
| jacob | 2 | 5 | 10 | 1542766983 |
View on DB Fiddle

Related

Using LIMIT in a subquery based on another field in MySQL

Is it possible to use LIMIT based on another column inside a subquery in MySQL? Here is a working query of what I mean.
SELECT id, name,
(SELECT AVG(value) FROM t2 WHERE t1id = t1.id ORDER BY value DESC LIMIT 4) as average
FROM t1
However I'd like to replace the "4" to a field inside t1.
Something like this where table t1 has fields id, name, size:
SELECT id, name,
(SELECT AVG(value) FROM t2 WHERE t1id = t1.id ORDER BY value DESC LIMIT t1.size) as average
FROM t1
I could join t1 and t2, but I'm not sure that works for this. Does it?
Edit:
Here's some sample data to show what I mean:
Table t1
| id | name | Size |
|----|------|------|
| 1 | Bob | 4 |
| 2 | Joe | 3 |
| 3 | Sam | 4 |
Table t2
| t1id | value |
|------|-------|
| 1 | 16 |
| 1 | 14 |
| 1 | 12 |
| 1 | 10 |
| 1 | 8 |
| 2 | 10 |
| 2 | 8 |
| 2 | 6 |
| 2 | 4 |
| 3 | 20 |
| 3 | 15 |
| 3 | 10 |
| 3 | 5 |
| 3 | 2 |
Expected result:
| id | name | avg |
|----|------|------|
| 1 | Bob | 13 |
| 2 | Joe | 8 |
| 3 | Sam | 12.5 |
Notice that the average is the average of only the top t1.size values. For example the average for Bob is 13 and not 12 (based on 4 values and not 5) and the average for Joe is 8 and not 7 (based on 3 values and not 4).
In MySQL, you have little choice other than LEFT JOIN and aggregation:
SELECT t1.id, t1.name, AVG(t2.value) as average
FROM t1 LEFT JOIN
(SELECT t2.*,
ROW_NUMBER() OVER (PARTITION BY t1id ORDER BY VALUE desc) as seqnum
FROM t2
) t2
on t2.t1id = t1.id AND seqnum <= t1.size
GROUP BY t1.id, t1.name;
Here is a db<>fiddle.
No, you cannot use a column reference in a LIMIT clause.
https://dev.mysql.com/doc/refman/8.0/en/select.html has detailed documentation about MySQL's SELECT statement including all its clauses.
It says:
The LIMIT clause can be used to constrain the number of rows returned by the SELECT statement. LIMIT takes one or two numeric arguments, which must both be nonnegative integer constants, with these exceptions:
Within prepared statements, LIMIT parameters can be specified using ? placeholder markers.
Within stored programs, LIMIT parameters can be specified using integer-valued routine parameters or local variables.
Expressions, including subqueries, are not mentioned as legal argument in the LIMIT clause.
A simple solution would be to do your task in two queries: the first to get the size and then use that value as a constant value in the second query that includes the LIMIT.
Not every task needs to be done in a single SQL statement.

MySql: update row that requires some logic processing

Given the following table:
id | company | names | group
-------------------------------------
0 | 1 | John | 1
1 | 1 | Doe | null //populate with preceding group number (i.e. 1)
2 | 1 | Yo | null //populate with preceding group number (i.e. 1)
3 | 1 | Zoe | null //populate with preceding group number (i.e. 1)
4 | 1 | Jack | 2
5 | 1 | Doe | null //populate with preceding group number (i.e. 2)
6 | 1 | Yo | null //populate with preceding group number (i.e. 2)
May I know how i can update only the null values to its preceding group number via sql statements? Thanks.
Try this:
UPDATE tablename t1
JOIN (
SELECT ID, #s:=IF(`group` IS NULL, #s, `group`) `group`
FROM (SELECT * FROM tablename ORDER BY ID) r,
(SELECT #s:=NULL) t
) t2
ON t1.ID = t2.ID
SET t1.`group`= t2.`group`
SQLFIDDLE DEMO

Select two items with maximum number of common values

I have the following table:
+----+-----------+-----------+
| id | teacherId | studentId |
+----+-----------+-----------+
| 1 | 1 | 4 |
| 2 | 1 | 2 |
| 3 | 1 | 1 |
| 4 | 1 | 3 |
| 5 | 2 | 2 |
| 6 | 2 | 1 |
| 7 | 2 | 3 |
| 8 | 3 | 9 |
| 9 | 3 | 6 |
| 10 | 1 | 6 |
+----+-----------+-----------+
I need a query to find two teacherId's with maximum number of common studentId's.
In this case teachers with teacherIds 1,2 have common students with studentIds 2, 1, 3, which is greater than 1,3 having common students 6.
Thanks in Advance!
[Edit]: After several hours I've had the following solution:
SELECT * FROM (
SELECT r1tid, r2tid, COUNT(r2tid) AS cnt
FROM (
SELECT r1.teacherId AS r1tid, r2.teacherId AS r2tid
FROM table r1
INNER JOIN table r2 ON r1.studentId=r2.studentId AND r1.teacherId!=r2.teacherId
ORDER BY r1tid
) t
GROUP BY r1tid, r2tid
ORDER BY cnt DESC
) t GROUP BY cnt ORDER BY cnt DESC LIMIT 1;
I was sure that there must exist more short and elegant solution, but I could not find it.
You would do this with a self-join. Assuming no duplicates in the table:
select t.teacherid, t2.teacherid, count(*) as NumStudentsInCommon
from table t join
table t2
on t.studentid = t2.studentid and
t.teacherid < t2.teacherid
group by t.teacherid, t2.teacherid
order by NumStudentsInCommon desc
limit 1;
If you had duplicates, you would just replace count(*) with count(distinct studentid), but count(distinct) requires a bit more work.
select t.teacherId, t2.teacherId, sum(t.studentId) as NumStudentsInCommon
from table1 t join
table1 t2
on t.studentId = t2.studentId and
t.teacherId < t2.teacherId
group by t.teacherId, t2.teacherId
order by NumStudentsInCommon desc

I need to get the average for every 3 records in one table and update column in separate table

Table Mytable1
Id | Actual
1 ! 10020
2 | 12203
3 | 12312
4 | 12453
5 | 13211
6 | 12838
7 | 10l29
Using the following syntax:
SELECT AVG(Actual), CEIL((#rank:=#rank+1)/3) AS rank FROM mytable1 Group BY rank;
Produces the following type of result:
| AVG(Actual) | rank |
+-------------+------+
| 12835.5455 | 1 |
| 12523.1818 | 2 |
| 12343.3636 | 3 |
I would like to take AVG(Actual) column and UPDATE a second existing table Mytable2
Id | Predict |
1 | 11133
2 | 12312
3 | 13221
I would like to get the following where the Actual value matches the ID as RANK
Id | Predict | Actual
1 | 11133 | 12835.5455
2 | 12312 | 12523.1818
3 | 13221 | 12343.3636
IMPORTANT REQUIREMENT
I need to set an offset much like the following syntax:
SELECT #rank := #rank + 1 AS Id , Mytable2.Actual FROM Mytable LIMIT 3 OFFSET 4);
PLEASE NOTE THE AVERAGE NUMBER ARE MADE UP IN EXAMPLES
you can join your existing query in the UPDATE statement
UPDATE Table2 T2
JOIN (
SELECT AVG(Actual) as AverageValue,
CEIL((#rank:=#rank+1)/3) AS rank
FROM Table1, (select #rank:=0) t
Group BY rank )T1
on T2.id = T1.rank
SET Actual = T1.AverageValue

query to fetch records and their rank in the DB

I have a table that holds usernames and results.
When a user insert his results to the DB, I want to execute a query that will return
the top X results ( with their rank in the db) and will also get that user result
and his rank in the DB.
the result should be like this:
1 playername 4500
2 otherplayer 4100
3 anotherone 3900
...
134 current player 140
I have tried a query with union, but then I didnt get the current player rank.
ideas anyone?
The DB is MYSQL.
10x alot and have agreat weekend :)
EDIT
This is what I have tried:
(select substr(first_name,1,10) as first_name, result
FROM top_scores ts
WHERE result_date >= NOW() - INTERVAL 1 DAY
LIMIT 10)
union
(select substr(first_name,1,10) as first_name, result
FROM top_scores ts
where first_name='XXX' and result=3030);
SET X = 0;
SELECT #X:=#X+1 AS rank, username, result
FROM myTable
ORDER BY result DESC
LIMIT 10;
Re your comment:
How about this:
SET X = 0;
SELECT ranked.*
FROM (
SELECT #X:=#X+1 AS rank, username, result
FROM myTable
ORDER BY result DESC
) AS ranked
WHERE ranked.rank <= 10 OR username = 'current';
Based on what I am reading here:
Your table structure is:
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| name | varchar(50) | YES | | NULL | |
| result | int(11) | YES | | NULL | |
+--------+-------------+------+-----+---------+-------+
Table Data looks like:
+---------+--------+
| name | result |
+---------+--------+
| Player1 | 4500 |
| Player2 | 4100 |
| Player3 | 3900 |
| Player4 | 3800 |
| Player5 | 3700 |
| Player6 | 3600 |
| Player7 | 3500 |
| Player8 | 3400 |
+---------+--------+
You want a result set to look like this:
+------+---------+--------+
| rank | name | result |
+------+---------+--------+
| 1 | Player1 | 4500 |
| 2 | Player2 | 4100 |
| 3 | Player3 | 3900 |
| 4 | Player4 | 3800 |
| 5 | Player5 | 3700 |
| 6 | Player6 | 3600 |
| 7 | Player7 | 3500 |
| 8 | Player8 | 3400 |
+------+---------+--------+
SQL:
set #rank = 0;
select
top_scores.*
from
(select ranks.* from (select #rank:=#rank+1 AS rank, name, result from ranks) ranks) top_scores
where
top_scores.rank <= 5
or (top_scores.result = 3400 and name ='Player8');
That will do what you want it to do
assuming your table has the following columns:
playername
score
calculated_rank
your query should look something like:
select calculated_rank,playername, score
from tablename
order by calculated_rank limit 5
I assume you have PRIMARY KEY on this table. If you don't, just create one. My table structure (because you didn't supply your own) is like this:
id INTEGER
result INTEGER
first_name VARCHAR
SQL query should be like that:
SELECT #i := #i+1 AS position, first_name, result FROM top_scores, (SELECT #i := 0) t ORDER BY result DESC LIMIT 10 UNION
SELECT (SELECT COUNT(id) FROM top_scores t2 WHERE t2.result > t1.result AND t2.id > t1.id) AS position, first_name, result FROM top_scores t1 WHERE id = LAST_INSERT_ID();
I added additional condition into subquery ("AND t2.id > t1.id") to prevent multiple people with same result having same position.
EDIT: If you have some login system, it would be better to save userid with result and get current user result using it.