Fetching extra attribute in group by clause MySQL [duplicate] - mysql

This question already has answers here:
Retrieving the last record in each group - MySQL
(33 answers)
Closed 4 years ago.
I have the following SQL relational schema
create table students(
s_id int primary key auto_increment,
s_name char(30),
s_city char(30),
s_number char(10)
)
create table subjects(
sub_id int primary key auto_increment,
sub_name char(30)
);
create table marks(
m_id int primary key auto_increment,
s_id int,
sub_id int,
marks int CHECK (marks <= 100 AND marks >= 0)
);
The database stores the information of student along with there marks in each subject.
I am able to fetch the maximum marks obtained in each subject using the following SQL query.
select sub_name, max(marks) as highest from subjects as sub, marks as m where sub.sub_id = m.sub_id group by sub_name;
and it gave me the following result:
+-----------+---------+
| sub_name | highest |
+-----------+---------+
| Chemistry | 100 |
| Maths | 98 |
| Physics | 90 |
+-----------+---------+
Now I want to add an extra attribute to the above result which is s_name which will specify the name of the student who obtained those marks, for that I used the following query.
select sub_name as subject, s_name as student, min(marks) as lowest from subjects as sub, marks as m, students as s where s.s_id = m.s_id and sub.sub_id = m.sub_id group by sub_name;
and it gave me the following result.
+-----------+-------------+--------+
| subject | student | lowest |
+-----------+-------------+--------+
| Chemistry | Mohd Shibli | 86 |
| Maths | Mohd Shibli | 96 |
| Physics | Mohd Shibli | 79 |
+-----------+-------------+--------+
The result obtained is not correct as in the 'student' column it is fetching the name of the first student-only instead of fetching the correct name.
Can anyone help me out for writing the query which can give the results about the max marks obtained in each subject and which student obtained it.

You can try below using correlated subquery
select sub_name as subject, s_name as student,m.marks
from marks m
inner join subjects as sub on m.sub_id=sub.sub_id
inner join students as s m.s_id=s.s_id
WHERE m.marks =
(SELECT MAX(mm.marks) FROM marks mm GROUP BY
mm.sub_id HAVING mm.sub_id=m.sub_id)

Related

Join two tables matching multiple ID's to names

Fiddle here: http://sqlfiddle.com/#!9/53d3c/2/0
I have two tables, one containing Member Names and their ID Number. Let's call that table Names:
CREATE TABLE Names (
ID int,
Title text
);
INSERT INTO Names
VALUES (11,'Chad'),
(10,'Deb'),
(34,'Steph'),
(13,'Chris'),
(98,'Peter'),
(33,'Daniel'),
(78,'Christine'),
(53,'Yolanda')
;
My second table contains meeting information, where someone is a Coach and someone is a Player. Each entry is a separate line (i.e. Meeting_ID 1 has two entries, one for the coach, one for the participant). Further, there is a column identifier for if that row is for a coach or player.
CREATE TABLE Meeting_Data (
Meeting_ID int,
Player_ID int,
Coach_ID int,
field_id int
);
INSERT INTO Meeting_Data
VALUES (1,0,11,2),
(1,10,0,1),
(2,34,0,1),
(2,0,13,2),
(3,98,0,1),
(3,0,33,2),
(4,78,0,1),
(4,0,53,2)
;
What I'm trying to do is create a table that puts each Meeting on one row, and then puts the ID#s and Names of the people meeting. When I attempt this, I get one column to pull successfully and then one column of (null) values.
SELECT Meeting_ID,
Max(CASE
WHEN field_id = 1 THEN Player_ID
END) AS Player_ID,
Max(CASE
WHEN field_id = 2 THEN Coach_ID
END) AS Coach_ID,
Player_Names.Title as Player_Names,
Coach_Names.Title as Coach_Names
FROM Meeting_Data
LEFT JOIN Names Player_Names
ON Player_ID = Player_Names.ID
LEFT JOIN Names Coach_Names
ON Coach_ID = Coach_Names.ID
GROUP BY Meeting_ID
Which results in:
| Meeting_ID | Player_ID | Coach_ID | Player_Names | Coach_Names |
|------------|-----------|----------|--------------|-------------|
| 1 | 10 | 11 | Deb | (null) |
| 2 | 34 | 13 | Steph | (null) |
| 3 | 98 | 33 | Peter | (null) |
| 4 | 78 | 53 | Christine | (null) |
How about something like this (http://sqlfiddle.com/#!9/53d3c/52/0):
SELECT Meeting_ID, Player_ID, Coach_ID, Players.Title, Coaches.Title
FROM (
SELECT Meeting_ID,
MAX(Player_ID) as Player_ID,
MAX(Coach_ID) as Coach_ID
FROM Meeting_Data
GROUP BY Meeting_ID
) meeting
LEFT JOIN Names Players ON Players.ID = meeting.Player_ID
LEFT JOIN Names Coaches ON Coaches.ID = meeting.Coach_ID

Removing duplicates based on one column, and keeping the row that has value in different column, and if there isn't any, keep lowest ID row

Using MySQL 5.7 on Google Cloud, I'm trying to deduplicate MySQL data based on an "EmailAddress" column, but some of the rows have a value in the "FullName" column and some of them don't. I want to keep the ones that have a value in the FullName column, but if none of the rows with that EmailAddress value a FullName value, then just keep the duplicate with the lowest ID number (first column - primary key).
I've finally broken it down into two separate queries, one to first remove the rows with no value in the FullName column IF there's another duplicate row that does have a value in the FullName column:
DELETE
FROM customer_info
WHERE id IN
(
SELECT *
FROM
(
SELECT c1.id
FROM customer_info c1
INNER JOIN customer_info c2 on c1.EmailAddress=c2.EmailAddress and c1.id!=c2.id
WHERE
(trim(c1.FullName)='' or c1.FullName is NULL)
and c2.FullName is not NULL
and length(trim(c2.FullName))!=0
) t
)
and another query to remove the rows with the bigger IDs where no value was found in the FullName column:
DELETE
FROM customer_info
WHERE id IN
(
SELECT *
FROM
(
SELECT c1.id
FROM customer_info c1
INNER JOIN customer_info c2 on c1.EmailAddress=c2.EmailAddress and c1.id>c2.id
) t
)
This "works", but not really. It worked one time when I left it running overnight for a smaller segment of the data, and when I woke up there was an error, but I looked at the data and it was complete.
Am I missing something in my query that's making it highly inefficient, or is it just par for the course for this type of query, and there's no optimization possible in my code that would make a tangible improvement? I've maxed out a Google Cloud SQL instance to their db-n1-highmem-32 size, with 32 GB of memory and 1000 GB of storage space, and it still chokes up and spits out a 2013 error after running for an hour. I need to do this for a total of a little over 3 million rows.
For example, this:
id | FullName | EmailAddress |
----------------------------------------------
1 | John Doe | john.doe#email.com |
2 | null | janedoe#box.com |
3 | null | billybob#bobby.com |
4 | null | john.doe#email.com |
5 | John Lennon | jlennon#yoohoo.com |
6 | null | james.smith#coolmail.com|
7 | null | billybob#bobby.com |
8 | Jane Doe | janedoe#box.com |
would result in this:
id | FullName | EmailAddress |
----------------------------------------------
1 | John Doe | john.doe#email.com |
3 | null | billybob#bobby.com |
5 | John Lennon | jlennon#yoohoo.com |
6 | null | james.smith#coolmail.com|
8 | Jane Doe | janedoe#box.com |
using exists() might be simpler in this situation
delete
from customer_info c
where (trim(c.FullName)='' or c.FullName is null)
and exists (
select 1
from customer_info i
where i.Email = c.EmailAddress
and trim(i.FullName)>''
)
delete
from customer_info c
where exists (
select 1
from customer_info i
where i.Email = c.EmailAddress
and i.id < c.id
)

How to do nested sum in group by in optimized/one query in MYSQL?

I've 3 tables, Manager, Employee & Salary. Following is the structure of all the tables.
Manager
id | Name
---------
111 | AAA
222 | BBB
Employee
id | Name | Manager_id | new_policy_deductions
----------------------------------
1 | A B | 111 | 100
2 | A C | 111 | 200
3 | C D | 222 | 200
Salary
id | employee_id | Month | Emp_Salary | Manager_id
---------------------------------------------------
1 | 1 | Jan | 500 | 111
2 | 1 | Feb | 500 | 111
3 | 1 | Mar | 600 | 111
4 | 2 | Apr | 500 | 111
5 | 1 | Apr | 700 | 111
6 | 3 | Mar | 300 | 222
7 | 3 | Apr | 500 | 222
employee_id is foreign key from Employee table to Salary table & manager_id is foreign key from Manager table to other tables.
Now, I need to construct a query such that I get following result.
Manager_id | Net_Salary
-----------------------
111 | 2500
222 | 600
How did I reached that numbers?
Take sum of salaries of all the employees under one manager (500 + 500 + 600 + 500 + 700 = 2800) & then subtract all new_policy_deductions in that manager (100 + 200 = 300). It implies 111 will have 2500 (2800 - 300).
Similarly, for 222 we have 600.
I was able to achieve this using 2 queries, which are as follows,
x = select manager_id, sum(Emp_Salary) from Salary group by manager_id
y = select manager_id, sum(new_policy_deductions) from Employee group by manager_id
result = x - y
Can this be achieved in single SQL query? If yes, how?
Note:
The actual table names are different then I used here.
I can't modify table structure. It was designed long time ago.
Nested SQL query is not allowed, as that is equivalent to 2 queries, and it will be inefficient.
Edit:
Following are the queries, which will help in creating dummy data.
create table manager (id int, name text);
create table employee (id int, name text, manager_id int, new_policy_deductions int);
create table salary(id int, employee_id int, emp_salary int, manager_id int);
select * from manager;
INSERT INTO manager
(`id`,
`name`)
VALUES
(111,'AAA'), (222,'BBB');
select * from employee;
INSERT INTO employee
(`id`,
`name`,
`manager_id`,
`new_policy_deductions`)
VALUES
(1,'A B',111,100), (2,'A C B',111,200), (3,'C A B',222,200);
select * from salary;
INSERT INTO salary
(`id`,
`employee_id`,
`month`,
`emp_salary`,
`manager_id`)
VALUES
(1,1,'Jan',500,111), (2,1,'Feb',500,111), (3,1,'Mar',600,111), (4,2,'Apr',500,111), (5,1,'Apr',700,111), (6,3,'Mar',300,222), (7,3,'Apr',500,222);
I've ignored foreign key constraints in query, as it is dummy data. Actual tables do have foreign key constraints.
Try this:
SELECT t1.Manager_id, sumOfSalaries - sumOfDeductions
FROM (
SELECT Manager_id, SUM(Emp_Salary) AS sumOfSalaries
FROM Salary
GROUP BY Manager_id) AS t1
INNER JOIN (
SELECT Manager_id, SUM(new_policy_deductions) AS sumOfDeductions
FROM Employee
GROUP BY Manager_id
) AS t2 ON t1.Manager_id = t2.Manager_id
Edit:
SELECT t1.Id, t1.Name,
COALESCE(sumOfSalaries, 0) - COALESCE(sumOfDeductions, 0) AS Net_Salary
FROM Manager AS t1
LEFT JOIN (
SELECT Manager_id, SUM(Emp_Salary) AS sumOfSalaries
FROM Salary
GROUP BY Manager_id
) AS t2 ON t1.Id = t2.Manager_id
INNER JOIN (
SELECT Manager_id, SUM(new_policy_deductions) AS sumOfDeductions
FROM Employee
GROUP BY Manager_id
) AS t3 ON t2.Manager_id = t3.Manager_id
select e.manager_id, (sum(e.Emp_Salary)-sum(s.new_policy_deductions))
FROM Salary as s
LEFT JOIN Employee as e
ON s.manager_id=e.manager_id
group by e.manager_id
would something like this be what you are looking for? might need some editing (typos are possible I dont have you db to check across)
In question to the comment this might be something you are also interested in:
select e.manager_id, (sum(e.Emp_Salary)-sum(s.new_policy_deductions))
FROM Salary as s
LEFT JOIN Employee as e
ON e.id=s.employee_id
group by e.manager_id
This is the best I can think of at current with the tables shown
I have a query that can help you: try this-
SELECT e.`Manager_id` as manager_id, (SELECT sum(s.`Emp_Salary`)
FROM salary as s
WHERE s.`Manager_id` = e.`Manager_id`) - sum(e.new_policy_deductions) as net_salary
FROM employee as e GROUP BY e.`Manager_id`
This is tested locally and output like you want. if some properties like-table name, field name change then please change. i think table names are are small later at my case.

MySQL Select from column use ^ as delimiter

My question similar to MySQL Split String and Select with results. Currently I have 2 tables:
student
uid | subject_id | name
1 | 1^2^3^4 | a
2 | 2^3^ | b
3 | 1 | c
subject
uid | subject_name
1 | math
2 | science
3 | languange
4 | sport
The result I expected is:
uid | name | subject_passed
1 | a | math, science, languange, sport
2 | b | science, languange
3 | c | sport
I have tried this query:
SELECT
student.uid,
student.name,
group_concat(subject.subject_name) as subjects_passed
from student
join subject on find_in_set(subject.uid,student.subject_id ) > 0
group by student.uid
Which returns the error:
#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use
near 'join subject on find_in_set(subject.uid,student.subject_id ) > 0
group' at line 7
I believe because of FIND_IN_SET. According to documentation, this function expects , as delimiter. Is there any alternative query I might use ?
Why not REPLACE the separator:
SELECT
student.uid,
student.name,
GROUP_CONCAT(subject.subject_name) AS subjects_passed
FROM student
JOIN subject ON FIND_IN_SET(subject.uid, REPLACE(student.subject_id, '^', ',')) > 0
GROUP BY student.uid
SQLFiddle
If you decide to de-normalize your tables it is fairly straight forward to create the junction table and generate the data:
-- Sample table structure
CREATE TABLE student_subject (
student_id int NOT NULL,
subject_id int NOT NULL,
PRIMARY KEY (student_id, subject_id)
);
-- Sample query to denormalize student <-> subject relationship
SELECT
student.uid AS student_id,
subject.uid AS subject_id
FROM student
JOIN subject ON FIND_IN_SET(subject.uid, REPLACE(student.subject_id, '^', ',')) > 0
+------------+------------+
| student_id | subject_id |
+------------+------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 2 |
| 2 | 3 |
| 3 | 1 |
+------------+------------+
You should never store data with a delimiter separator and should normalize the table and create the 3rd table to store student to subject relation.
However in the current case you may do it as
select
st.uid,
st.name,
group_concat(sb.subject_name) as subject_name
from student st
left join subject sb on find_in_set(sb.uid,replace(st.subject_id,'^',',')) > 0
group by st.uid
Here is the option to create third table and store the relation
create table student_to_subject (id int primary key auto_increment, stid int, subid int);
insert into student_to_subject(stid,subid) values
(1,1),(1,2),(1,3),(1,4),(2,2),(2,3),(3,1);
Now you can remove the column subject_id from the student table
So the query becomes
select
st.uid,
st.name,
group_concat(sb.subject_name) as passed_subject
from student st
join student_to_subject sts on sts.stid = st.uid
join subject sb on sb.uid = sts.subid
group by st.uid;
http://www.sqlfiddle.com/#!9/f02df
Think you can replace ^with , before calling find_in_set:
SELECT
student.uid,
student.name,
group_concat(subject.subject_name) as subjects_passed
from student
join subject on find_in_set(subject.uid, replace(student.subject_id,'^',',') ) > 0
group by student.uid
But of course storeing values in such a format is very bad db design.

What's the most efficient way to structure a 2-dimensional MySQL query?

I have a MySQL database with the following tables and fields:
Student (id)
Class (id)
Grade (id, student_id, class_id, grade)
The student and class tables are indexed on id (primary keys). The grade table is indexed on id (primary key) and student_id, class_id and grade.
I need to construct a query which, given a class ID, gives a list of all other classes and the number of students who scored more in that other class.
Essentially, given the following data in the grades table:
student_id | class_id | grade
--------------------------------------
1 | 1 | 87
1 | 2 | 91
1 | 3 | 75
2 | 1 | 68
2 | 2 | 95
2 | 3 | 84
3 | 1 | 76
3 | 2 | 88
3 | 3 | 71
Querying with class ID 1 should yield:
class_id | total
-------------------
2 | 3
3 | 1
Ideally I'd like this to execute in a few seconds, as I'd like it to be part of a web interface.
The issue I have is that in my database, I have over 1300 classes and 160,000 students. My grade table has almost 15 million rows and as such, the query takes a long time to execute.
Here's what I've tried so far along with the times each query took:
-- I manually stopped execution after 2 hours
SELECT c.id, COUNT(*) AS total
FROM classes c
INNER JOIN grades a ON a.class_id = c.id
INNER JOIN grades b ON b.grade < a.grade AND
a.student_id = b.student_id AND
b.class_id = 1
WHERE c.id != 1 AND
GROUP BY c.id
-- I manually stopped execution after 20 minutes
SELECT c.id,
(
SELECT COUNT(*)
FROM grades g
WHERE g.class_id = c.id AND g.grade > (
SELECT grade
FROM grades
WHERE student_id = g.student_id AND
class_id = 1
)
) AS total
FROM classes c
WHERE c.id != 1;
-- 1 min 12 sec
CREATE TEMPORARY TABLE temp_blah (student_id INT(11) PRIMARY KEY, grade INT);
INSERT INTO temp_blah SELECT student_id, grade FROM grades WHERE class_id = 1;
SELECT o.id,
(
SELECT COUNT(*)
FROM grades g
INNER JOIN temp_blah t ON g.student_id = t.student_id
WHERE g.class_id = c.id AND t.grade < g.grade
) AS total
FROM classes c
WHERE c.id != 1;
-- Same thing but with joins instead of a subquery - 1 min 54 sec
SELECT c.id,
COUNT(*) AS total
FROM classes c
INNER JOIN grades g ON c.id = p.class_id
INNER JOIN temp_blah t ON g.student_id = t.student_id
WHERE c.id != 1
GROUP BY c.id;
I also considered creating a 2D table, with students as rows and classes as columns, however I can see two issues with this:
MySQL implements a maximum column count (4096) and maximum row size (in bytes) which may be exceeded by this approach
I can't think of a good way to query that structure to get the results I need
I also considered performing these calculations as background jobs and storing the results somewhere, but for the information to remain current (it must), they would need to be recalculated every time a student, class or grade record was created or updated.
Does anyone know a more efficient way to construct this query?
EDIT: Create table statements:
CREATE TABLE `classes` (
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1331 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci$$
CREATE TABLE `students` (
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=160803 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci$$
CREATE TABLE `grades` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`student_id` int(11) DEFAULT NULL,
`class_id` int(11) DEFAULT NULL,
`grade` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_grades_on_student_id` (`student_id`),
KEY `index_grades_on_class_id` (`class_id`),
KEY `index_grades_on_grade` (`grade`)
) ENGINE=InnoDB AUTO_INCREMENT=15507698 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci$$
Output of explain on the most efficient query (the 1 min 12 sec one):
id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | PRIMARY | c | range | PRIMARY | PRIMARY | 4 | | 683 | Using where; Using index
2 | DEPENDENT SUBQUERY | g | ref | index_grades_on_student_id,index_grades_on_class_id,index_grades_on_grade | index_grades_on_class_id | 5 | mydb.c.id | 830393 | Using where
2 | DEPENDENT SUBQUERY | t | eq_ref | PRIMARY | PRIMARY | 4 | mydb.g.student_id | 1 | Using where
Another edit - explain output for sgeddes suggestion:
+----+-------------+------------+--------+---------------+------+---------+------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+------+---------+------+----------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 14953992 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | <derived3> | system | NULL | NULL | NULL | NULL | 1 | Using filesort |
| 2 | DERIVED | G | ALL | NULL | NULL | NULL | NULL | 15115388 | |
| 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+-------------+------------+--------+---------------+------+---------+------+----------+----------------------------------------------+
I think this should work for you using SUM and CASE:
SELECT C.Id,
SUM(
CASE
WHEN G.Grade > C2.Grade THEN 1 ELSE 0
END
)
FROM Class C
INNER JOIN Grade G ON C.Id = G.Class_Id
LEFT JOIN (
SELECT Grade, Student_Id, Class_Id
FROM Class
JOIN Grade ON Class.Id = Grade.Class_Id
WHERE Class.Id = 1
) C2 ON G.Student_Id = C2.Student_Id
WHERE C.Id <> 1
GROUP BY C.Id
Sample Fiddle Demo
--EDIT--
In response to your comment, here is another attempt that should be much faster:
SELECT
Class_Id,
SUM(CASE WHEN Grade > minGrade THEN 1 ELSE 0 END)
FROM
(
SELECT
Student_Id,
#classToCheck:=
IF(G.Class_Id = 1, Grade, #classToCheck) minGrade ,
Class_Id,
Grade
FROM Grade G
JOIN (SELECT #classToCheck:= 0) t
ORDER BY Student_Id, IF(Class_Id = 1, 0, 1)
) t
WHERE Class_Id <> 1
GROUP BY Class_ID
And more sample fiddle.
Can you give this a try on the original data as well! It is only one join :)
select
final.class_id, count(*) as total
from
(
select * from
(select student_id as p_student_id, grade as p_grade from table1 where class_id = 1) as partial
inner join table1 on table1.student_id = partial.p_student_id
where table1.class_id <> 1 and table1.grade > partial.p_grade
) as final
group by
final.class_id;
sqlfiddle link