I have two mysql greatest-n-per-group, greatest-by-date problems:
Considering one students table and one grades table, I want to have all students displayed with their most recent grade.
The schema script:
CREATE TABLE student (
id int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO student VALUES(1, 'jim');
INSERT INTO student VALUES(2, 'mark ');
INSERT INTO student VALUES(3, 'john');
CREATE TABLE grades (
id int(11) NOT NULL AUTO_INCREMENT,
student_id int(11) NOT NULL,
grade int(11) NOT NULL,
`date` date DEFAULT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO grades VALUES(1, 1, 6, NULL);
INSERT INTO grades VALUES(2, 1, 8, NULL);
INSERT INTO grades VALUES(3, 1, 10, NULL);
INSERT INTO grades VALUES(4, 2, 9, '2016-05-10');
INSERT INTO grades VALUES(5, 2, 8, NULL);
INSERT INTO grades VALUES(6, 3, 6, '2016-05-26');
INSERT INTO grades VALUES(7, 3, 7, '2016-05-27');
A) I want to find out if this is a valid solution for getting the most recent record by a date field (date) from a secondary table (grades) grouped for each row in a main table (student).
My query is:
SELECT s.id, s.name, g.grade, g.date
FROM student AS s
LEFT JOIN (
SELECT student_id, grade, DATE
FROM grades AS gr
WHERE DATE = (
SELECT MAX(DATE)
FROM grades
WHERE student_id = gr.student_id
)
GROUP BY student_id
) AS g ON s.id = g.student_id
Sql Fiddle: http://sqlfiddle.com/#!9/a84171/2
This query displays the desired (almost) results. But I have doubts that this is the best approach because it looks ugly, so I am very curious about the alternatives.
B) The second problem is the reason for the (almost) above,
For the first row, name=Jim it finds no grade though we have grades for Jim.
So just in case the query above would be valid only for NOT NULL date fields.
The question would be:
How to get the most recent grade for all the students who have grades, including Jim even that his grades has no date specified (NULL). In this case the most recent grouping will be given by the latest row inserted (MAX(id)) or just random.
Doesn't work with replacing date = (SELECT... with date IN (SELECT ....
Any help would be much appreciated,
Thanks!
[UPDATE #1]:
For B) I found adding this to the sub-query, OR date IS NULL, produces the desired result:
SELECT s.id, s.name, g.grade, g.date
FROM student AS s
LEFT JOIN (
SELECT id, student_id, grade, DATE
FROM grades AS gr
WHERE DATE = (
SELECT MAX(DATE)
FROM grades
WHERE student_id = gr.student_id
) OR date IS NULL
GROUP BY student_id
) AS g ON s.id = g.student_id
[UPDATE #2]
Seems the previous update worked if the first grade has a date for a student. It doesn't if the first grade is null. I would have linked a fiddle but it seems sqlfiddle doesn't work now.
So this is what I came up until now that seems to solve the B) problem:
SELECT s.id, s.name, g.grade, g.date
FROM student AS s
LEFT JOIN (
SELECT id, student_id, grade, DATE
FROM grades AS gr
WHERE (
`date` = (
SELECT MAX(DATE)
FROM grades
WHERE student_id = gr.student_id
)
) OR (
(
SELECT MAX(DATE)
FROM grades
WHERE student_id = gr.student_id
) IS NULL AND
date IS NULL
)
) AS g ON s.id = g.student_id
GROUP BY student_id
I still would like to know if you guys know better alternatives to this ugly thing.
Thanks!
[UPDATE #3]
#Strawberry
The desired results would be:
id name grade date
1 jim 10 NULL
2 mark 9 2016-05-10
3 john 7 2016-05-27
each student with one corresponding grade
if a date exists for a grade, then get the most recent one.
The complexity of this problem stems from the logical impossibility of a grade without an associated date, so obviously the solution is to fix that.
But here's a workaround...
E.g.:
SELECT a.*
FROM grades a
JOIN
( SELECT student_id
, MAX(COALESCE(UNIX_TIMESTAMP(date),id)) date
FROM grades
GROUP
BY student_id
) b
ON b.student_id = a.student_id
AND b.date = COALESCE(UNIX_TIMESTAMP(a.date),id);
http://sqlfiddle.com/#!9/ecec43/4
SELECT s.id, s.name, g.grade, g.date
FROM student AS s
LEFT JOIN (
SELECT gr.student_id, gr.grade, gr.DATE
FROM grades AS gr
LEFT JOIN grades grm
ON grm.student_id = gr.student_id
AND grm.date>gr.date
WHERE grm.student_id IS NULL
AND gr.date IS NOT NULL
GROUP BY gr.student_id
) AS g
ON s.id = g.student_id;
Related
Consider the following schema:
Student (RollNo int Not Null, Name varchar(20) Not Null, YearOfAdmission int Not Null,
PRIMARY KEY(RollNo))
Friend(OwnRoll int Not Null, FriendRoll int Not Null,
PRIMARY KEY(OwnRoll, FriendRoll),
FOREIGN KEY fk_std1(OwnRoll) REFERENCES Student(RollNo),
FOREIGN KEY fk_std2(FriendRoll) REFERENCES Student(RollNo))
Movie(MID int Not Null, Title varchar(30) Not Null, YearOfRelease int Not Null, DirectorName
varchar(20) Null,
PRIMARY KEY(MID)) [Assume all director names are unique. However, same director can direct
many movies]
Rating(RollNo int Not Null, MID int Not Null, RatingDate date Not Null, Rating int Not Null,
PRIMARY KEY(RollNo, MID, RatingDate),
FOREIGN KEY fk_std4(RollNo) REFERENCES Student(RollNo),
FOREIGN KEY fk_mov2(MID) REFERENCES Movie(MID));
Now Ques is as follow:
List the students whose average rating over all movies (including multiple instances of rating
the movies on different dates) is less than the average rating of those movies by his/her friends
(including multiple instances of rating the movies on different dates). (Output format: RollNo1,
AverageRating1, RollNo2, AverageRating2)
One of the possible answer is -
Select x.OwnRoll as RollNo1, l1.average as AverageRating1,
x.FriendRoll as RollNo2, l2.average as AverageRating2
from
(
Select * from Friend
union
(
Select f.FriendRoll, f.OwnRoll from Friend as f
)
order by OwnRoll
) as x,
(
Select r.Rollno, avg(r.Rating) as average
from Rating as r
group by r.Rollno
) as l1,
(
Select r.Rollno, avg(r.Rating) as average
from Rating as r
group by r.Rollno
) as l2
where l1.Rollno = x.OwnRoll and l2.Rollno = x.FriendRoll
and l1.average > l2.average ;
But this version doesn't consider the friends who have never rated a movie, thus their average rating is 0
Thanks in advance for any update to the question and its answers.
Do this litte update to capture rating average as 0 in case Roll No is not there.
Select distinct r.Rollno, avg(r.Rating) as average
from Rating as r
group by r.Rollno
union
(
Select RollNo, 0 as average
from Student
where Rollno not in (Select RollNo from Rating)
)
You can even follow Coalesce more about which mentioned here
Similar Question is posted here
Final Answer is
Select x.OwnRoll as RollNo1, l1.average as AverageRating1,
x.FriendRoll as RollNo2, l2.average as AverageRating2
from
(
Select * from Friend
union (Select f.FriendRoll, f.OwnRoll from Friend as f)
order by OwnRoll
) as x,
(
Select distinct r.Rollno, avg(r.Rating) as average
from Rating as r
group by r.Rollno
union
(
Select RollNo, 0 as average
from Student
where Rollno not in
(Select RollNo from Rating))
) as l1,
(
Select distinct r.Rollno, avg(r.Rating) as average
from Rating as r
group by r.Rollno
union
(
Select RollNo, 0 as average
from Student
where Rollno not in
(Select RollNo from Rating))
) as l2
where l1.Rollno = x.OwnRoll and l2.Rollno = x.FriendRoll
and l1.average > l2.average
Try understanding it step by step !! Else you will get lost :)
TABLE [tbl_hobby]
person_id (int) , hobby_id(int)
has many records. I want to get a SQL query to find all pairs of personid who have the same hobbies( same hobby_id ).
If A has hobby_id 1, B has too, if A doesn't have hobby_id 2, B doesn't have too, we will output A & B 's person_ids.
If A and B and C reach the limits, we output A & B , B & C, A & C.
I've finished in a very very very stupid method, multiple joins the table itself and multiple sub-queries. And of course be laughed by leader.
Is there any high performance method in a SQL for this question?
I have been thinking hard for this since 36 hrs ago......
sample data in mysql dump
CREATE TABLE `tbl_hobby` (
`person_id` int(11) NOT NULL,
`hobby_id` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `tbl_hobby` (`person_id`, `hobby_id`) VALUES
(1, 1),(1, 2),(1, 3),(1, 4),(1, 5),(2, 2),
(2, 3),(2, 4),(3, 1),(3, 2),(3, 3),(3, 4),
(4, 1),(4, 3),(4, 4),(5, 1),(5, 5),(5, 9),
(6, 2),(6, 3),(6, 4),(7, 1),(7, 3),(7, 7),
(8, 2),(8, 3),(8, 4),(9, 1),(9, 2),(9, 3),
(9, 4),(10, 1),(10, 5),(10, 9),(10, 11);
COMMIT;
Expert result: (2 and 6 and 8 same, 3 and 9 same)
2,6
2,8
6,8
3,9
Order of result records and order of the two number in one record is not important. Result record in one column or in two columns are all accepted since it can be easily concated or seperated.
Aggregate per person to get strings of their hobbies. Then aggregate per hobby list find out which belong to more than one person.
select hobbies, group_concat(person_id order by person_id) as persons
from
(
select person_id, group_concat(hobby_id order by hobby_id) as hobbies
from tbl_hobby
group by person_id
) persons
group by hobbies
having count(*) > 1
order by hobbies;
This gives a a list of persons per hobby. Which is the easiest way to output a solution as we would otherwise have to build all possible pairs.
UPDATE: If you want pairs, you'll have to query the table twice:
select p1.person_id as person 1, p2.person_id as person2
from
(
select person_id, group_concat(hobby_id order by hobby_id) as hobbies
from tbl_hobby
group by person_id
) p1
join
(
select person_id, group_concat(hobby_id order by hobby_id) as hobbies
from tbl_hobby
group by person_id
) p2 on p2.person_id > p1.person_id and p2.hobbies = p1.hobbies
order by person1, person2;
Alternative version, without using any proprietary string handling:
select distinct t1.person_id, t2.person_id
from tbl_hobby t1
join tbl_hobby t2
on t1.person_id < t2.person_id
where 2 = all (select count(*)
from tbl_hobby
where person_id in (t1.person_id, t2.person_id)
group by hobby_id);
Perhaps less efficient, but portable!
table a
no name
2001 jon
2002 jonny
2003 mik
2004 mike
2005 mikey
2006 tom
2007 tomo
2008 tommy
table b
code name credits courseCode
A2 JAVA 25 wer
A3 php 25 wer
A4 oracle 25 wer
B2 p.e 50 oth
B3 sport 50 oth
C2 r.e 25 rst
C3 science 25 rst
C4 networks 25 rst
table c
studentNumber grade coursecode
2003 68 A2
2003 72 A3
2003 53 A4
2005 48 A2
2005 52 A3
2002 20 A2
2002 30 A3
2002 50 A4
2008 90 B2
2007 73 B2
2007 63 B3
SELECT a.num, a.Fname,
b.courseName, b.cMAXscore, b.cCode, c.stuGrade
FROM a
INNER JOIN c
ON a.no = c.no
INNER JOIN b
ON c.moduleCode = b.cCode
INNER JOIN b
ON SUM(b.cMAXscore) / (c.stuGrade)
AND b.cMAXscore = c.stug=Grade
GROUP BY a.Fname, b.cMAXscore, b.cCode, b.courseName,c.stuGrade
"calculate and display every student name(a.Fname) and their ID number(a.num) along with their grade (c.grade) versus the coursse name(b.courseName) and the courses max score(b.cMAXscoure). "
I cant figure out how to divide the MAX by the grade, can someone help?
From the specification, it doesn't look like an aggregate function or a GROUP BY would be necessary. But the specification is ambiguous. There's no table definitions (beyond the unfortunate names and some column references).
Definitions of the tables, along with example data and an example of the desired resultset would go a long ways to removing the ambiguity.
Based on the join predicates in the OP query, I'd suggest something like this query, as a starting point:
SELECT a.Fname
, a.num
, c.grade
, b.courseName
, b.cMAXsource
FROM a
JOIN c
ON c.no = a.no
JOIN b
ON b.cCode = c.moduleCode
ORDER
BY a.Fname
, a.num
, c.grade
, b.courseName
, b.cMAXsource
It seems like that would return the specified result (based on my interpretation of the vague specification.) If that's insufficient i.e. if that doesn't return the desired resultset, then in what way does the desired result differ from the result from this query?
(For more help with your question, I suggest you setup a sqlfiddle example with tables and example data. That will make it easier for someone to help you.)
FOLLOWUP
Based on the additional information provided in the question (table definitions and example data...
To get the maximum (highest) grade for a given course, you could use a query like this:
SELECT MAX(c.grade)
FROM c
WHERE c.coursecode = 'A2'
To get the highest grade for all courses:
SELECT c.coursecode
, MAX(c.grade) AS max_grade
FROM c
GROUP BY c.coursecode
ORDER BY c.coursecode
To match the highest grade for each course to each student grade, use that previous query as an inline view in another query. Something like this:
SELECT g.studentNumber
, g.grade
, g.coursecode
, h.coursecode
, h.highest_grade
FROM c g
JOIN ( SELECT c.coursecode
, MAX(c.grade) AS highest_grade
FROM c
GROUP BY c.coursecode
) h
ON h.coursecode = g.coursecode
To perform a calculation, you can use an expression in the SELECT list of the outer query.
For example, to divide the value of one column by another, you can use the division operator:
SELECT g.studentNumber AS student_number
, g.grade AS student_grade
, g.coursecode AS student_coursecode
, h.coursecode
, h.highest_grade
, g.grade / h.highest_grade AS `student_grade_divided_by_highest_grade`
FROM c g
JOIN ( SELECT c.coursecode
, MAX(c.grade) AS highest_grade
FROM c
GROUP BY c.coursecode
) h
ON h.coursecode = g.coursecode
If you want to also return the name of the student, you can perform a join operation to (the unfortunately named) table a. Assuming that studentnumber is UNIQUE in a :
LEFT
JOIN a
ON a.studentnumber = c.studentnumber
And include a.Fname AS student_first_name in the SELECT list.
If you also need columns from table b, then join that table as well. Assuming that coursecode is UNIQUE in b:
LEFT
JOIN b
ON b.coursecode = g.courscode
Then b.credits can be referenced in an expression in the SELECT list.
Beyond that, you need to be a little more explicit about what result should be returned by the query.
If you are after a "total overall grade" for a student, you'd need to specify how that result should be obtained.
Without knowing table definations it is very hard to provide solution to your problem.
Here is my version of what you are trying to look for:
DECLARE #Student TABLE
(StudentID INT IDENTITY,
FirstName VARCHAR(255),
LastName VARCHAR(255)
);
DECLARE #Course TABLE
(CourseID INT IDENTITY,
CourseCode VARCHAR(25),
CourseName VARCHAR(255),
MaxScore INT
);
DECLARE #Grade TABLE
(ID INT IDENTITY,
CourseID INT,
StudentID INT,
Score INT
);
--Student
insert into #Student(FirstName, LastName)
values ('Test', 'B')
insert into #Student(FirstName, LastName)
values ('Test123', 'K')
--Course
insert into #Course(CourseCode, CourseName, MaxScore)
values ('MAT101', 'MATH',100.00)
insert into #Course(CourseCode, CourseName, MaxScore)
values ('ENG101', 'ENGLISH',100.00)
--Grade
insert into #Grade(CourseID, StudentID, Score)
values (1, 1,93)
insert into #Grade(CourseID, StudentID, Score)
values (1, 1,65)
insert into #Grade(CourseID, StudentID, Score)
values (1, 1,100)
insert into #Grade(CourseID, StudentID, Score)
values (2, 1,100)
insert into #Grade(CourseID, StudentID, Score)
values (2, 1,69)
insert into #Grade(CourseID, StudentID, Score)
values (2, 1,95)
insert into #Grade(CourseID, StudentID, Score)
values (1, 2,100)
insert into #Grade(CourseID, StudentID, Score)
values (1, 2,65)
insert into #Grade(CourseID, StudentID, Score)
values (1, 2,100)
insert into #Grade(CourseID, StudentID, Score)
values (2, 2,100)
insert into #Grade(CourseID, StudentID, Score)
values (2, 2,88)
insert into #Grade(CourseID, StudentID, Score)
values (2, 2,96)
SELECT a.StudentID,
a.FirstName,
a.LastName,
c.CourseCode,
SUM(b.Score) AS 'StudentScore',
SUM(c.MaxScore) AS 'MaxCourseScore',
SUM(CAST(b.Score AS DECIMAL(5, 2))) / SUM(CAST(c.MaxScore AS DECIMAL(5, 2))) AS 'Grade'
FROM #Student a
INNER JOIN #Grade b ON a.StudentID = b.StudentID
INNER JOIN #Course c ON c.CourseID = b.CourseID
GROUP BY a.StudentID,
a.FirstName,
a.LastName,
c.CourseCode;
The problem statement doesn't say anything about dividing by the max, I think you're misunderstanding it.
You need to write a subquery that gets the maximum score for each class, using MAX and GROUP BY. You can then join this with the other tables.
SELECT s.name AS student_name, c.name AS course_name, g.grade, m.max_grade
FROM student AS s
JOIN grade AS g ON s.no = g.studentNumber
JOIN course AS c ON c.code = g.courseCode
JOIN (SELECT courseCode, MAX(grade) AS max_grade
FROM grade
GROUP BY courseCode) AS m
ON m.courseCode = c.courseCode
If you did need to divide the grade by the maximum, you can use g.grade/m.max_grade.
I have a table with timestamped rows: say, some feed with authors:
CREATE TEMPORARY TABLE `feed` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
`author` VARCHAR(255) NOT NULL,
`tm` DATETIME NOT NULL
);
I'd like to sort by tm DESC but in such a way that rows from one author do stick together.
For instance, having
INSERT INTO `feed` VALUES
( 5, 'peter', NOW()+1 ),
( 4, 'helen', NOW()-1 ),
( 3, 'helen', NOW()-2 ),
( 2, 'peter', NOW()-10 ),
( 1, 'peter', NOW()-11 );
The result set should be sorted by tm DESC, but all peter posts go first because his post is the most recent one. The next set of rows should originate from the author with the 2nd most recent post. And so on.
5 peter
2 peter
1 peter
3 helen
2 helen
First we sort authors by recent post, descending. Then, having this "rating", we sort the feed with authors sorted by recent post.
Create in line view calculating the Min Tm and then join to it.
SELECT f.*
FROM feed f
INNER JOIN (SELECT MAX(TM) MAXTM,
author
FROM Feed
GROUP BY Author)m
ON f.author = m.author
ORDER BY m.MAXTM DESC,
f.author
DEMO
You could try something like this:
select *
from feed
order by
(select max(tm) from feed f2 where f2.author = feed.author) desc,
tm desc
This sorts first by the time of the most recent post of the author, then by tm.
SELECT *
FROM `feed`
LEFT JOIN (
SELECT
#rownum:=#rownum+1 AS `rowid`,
`author`,
MAX(`tm`) AS `max_tm`
FROM (SELECT #rownum:=0) r, `feed`
GROUP BY `author`
ORDER BY `max_tm` DESC
) `feedsort` ON(`feed`.`author` = `feedsort`.`author`)
ORDER BY
`feedsort`.`rowid` ASC,
`feed`.`tm` DESC;
This solves the problem but I'm sure there's a better solution
person_id | manager_id | name |
| | |
-------------------------------
Query to find name of manager who supervises maximum number of employees?
Added: This is the only table. Yes self-referencing. DB is mysql. Recursive queries will also do.
This query returns the manager_id and manager_name of the manager with the maximal number of employees.
The trick is in the HAVING clause, which allows aggregates and counts over multiple rows.
SELECT manager_id,name, count(*)
FROM table
GROUP BY manager_id, name
HAVING max(count(*));
You can read more in the short but informative w3schools.com HAVING clause tutorial.
If the manager_id references a person id in the same table, Svinto's answer might be more suitable.
SELECT name
FROM table
WHERE person_id = (
SELECT manager_id
FROM table
GROUP BY manager_id
HAVING max(count(*)))
It's not entirely clear to me what you want, so if this isn't what you want please clarify your question.
This query returns just one of the managers if there is a tie:
SELECT T2.name FROM (
SELECT manager_id
FROM table1
WHERE manager_id IS NOT NULL
GROUP BY manager_id
ORDER BY count(*) DESC
LIMIT 1
) AS T1
JOIN table1 AS T2
ON T1.manager_id = T2.person_id
Result of query:
Bar
Here's a query that fetches all managers with the tied maximum count in the case that there is a tie:
SELECT name FROM (
SELECT manager_id, COUNT(*) AS C
FROM person
WHERE manager_id IS NOT NULL
GROUP BY manager_id) AS Counts
JOIN (
SELECT COUNT(*) AS C
FROM person
WHERE manager_id IS NOT NULL
GROUP BY manager_id
ORDER BY COUNT(*) DESC
LIMIT 1
) AS MaxCount
ON Counts.C = MaxCount.C
JOIN person
ON Counts.manager_id = person.person_id
Result of the second query:
Foo
Bar
Here's my test data:
CREATE TABLE Table1 (person_id int NOT NULL, manager_id nvarchar(100) NULL, name nvarchar(100) NOT NULL);
INSERT INTO Table1 (person_id, manager_id, name) VALUES
(1, NULL, 'Foo'),
(2, '1', 'Bar'),
(3, '1', 'Baz'),
(4, '2', 'Qux'),
(5, '2', 'Quux'),
(6, '3', 'Corge');
Assuming manager_id have a reference to person_id and name of table: table_name
SELECT name FROM (
SELECT manager_id
FROM table_name
GROUP BY manager_id
ORDER BY COUNT(*) DESC
LIMIT 1
) t
INNER JOIN table_name ON t.manager_id = table_name.person_id
edit:
Removed HAVING MAX COUNT, added ORDER BY COUNT DESC LIMIT 1 in subquery