My question similar to MySQL Split String and Select with results. Currently I have 2 tables:
student
uid | subject_id | name
1 | 1^2^3^4 | a
2 | 2^3^ | b
3 | 1 | c
subject
uid | subject_name
1 | math
2 | science
3 | languange
4 | sport
The result I expected is:
uid | name | subject_passed
1 | a | math, science, languange, sport
2 | b | science, languange
3 | c | sport
I have tried this query:
SELECT
student.uid,
student.name,
group_concat(subject.subject_name) as subjects_passed
from student
join subject on find_in_set(subject.uid,student.subject_id ) > 0
group by student.uid
Which returns the error:
#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use
near 'join subject on find_in_set(subject.uid,student.subject_id ) > 0
group' at line 7
I believe because of FIND_IN_SET. According to documentation, this function expects , as delimiter. Is there any alternative query I might use ?
Why not REPLACE the separator:
SELECT
student.uid,
student.name,
GROUP_CONCAT(subject.subject_name) AS subjects_passed
FROM student
JOIN subject ON FIND_IN_SET(subject.uid, REPLACE(student.subject_id, '^', ',')) > 0
GROUP BY student.uid
SQLFiddle
If you decide to de-normalize your tables it is fairly straight forward to create the junction table and generate the data:
-- Sample table structure
CREATE TABLE student_subject (
student_id int NOT NULL,
subject_id int NOT NULL,
PRIMARY KEY (student_id, subject_id)
);
-- Sample query to denormalize student <-> subject relationship
SELECT
student.uid AS student_id,
subject.uid AS subject_id
FROM student
JOIN subject ON FIND_IN_SET(subject.uid, REPLACE(student.subject_id, '^', ',')) > 0
+------------+------------+
| student_id | subject_id |
+------------+------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 2 |
| 2 | 3 |
| 3 | 1 |
+------------+------------+
You should never store data with a delimiter separator and should normalize the table and create the 3rd table to store student to subject relation.
However in the current case you may do it as
select
st.uid,
st.name,
group_concat(sb.subject_name) as subject_name
from student st
left join subject sb on find_in_set(sb.uid,replace(st.subject_id,'^',',')) > 0
group by st.uid
Here is the option to create third table and store the relation
create table student_to_subject (id int primary key auto_increment, stid int, subid int);
insert into student_to_subject(stid,subid) values
(1,1),(1,2),(1,3),(1,4),(2,2),(2,3),(3,1);
Now you can remove the column subject_id from the student table
So the query becomes
select
st.uid,
st.name,
group_concat(sb.subject_name) as passed_subject
from student st
join student_to_subject sts on sts.stid = st.uid
join subject sb on sb.uid = sts.subid
group by st.uid;
http://www.sqlfiddle.com/#!9/f02df
Think you can replace ^with , before calling find_in_set:
SELECT
student.uid,
student.name,
group_concat(subject.subject_name) as subjects_passed
from student
join subject on find_in_set(subject.uid, replace(student.subject_id,'^',',') ) > 0
group by student.uid
But of course storeing values in such a format is very bad db design.
Related
This question already has answers here:
Retrieving the last record in each group - MySQL
(33 answers)
Closed 4 years ago.
I have the following SQL relational schema
create table students(
s_id int primary key auto_increment,
s_name char(30),
s_city char(30),
s_number char(10)
)
create table subjects(
sub_id int primary key auto_increment,
sub_name char(30)
);
create table marks(
m_id int primary key auto_increment,
s_id int,
sub_id int,
marks int CHECK (marks <= 100 AND marks >= 0)
);
The database stores the information of student along with there marks in each subject.
I am able to fetch the maximum marks obtained in each subject using the following SQL query.
select sub_name, max(marks) as highest from subjects as sub, marks as m where sub.sub_id = m.sub_id group by sub_name;
and it gave me the following result:
+-----------+---------+
| sub_name | highest |
+-----------+---------+
| Chemistry | 100 |
| Maths | 98 |
| Physics | 90 |
+-----------+---------+
Now I want to add an extra attribute to the above result which is s_name which will specify the name of the student who obtained those marks, for that I used the following query.
select sub_name as subject, s_name as student, min(marks) as lowest from subjects as sub, marks as m, students as s where s.s_id = m.s_id and sub.sub_id = m.sub_id group by sub_name;
and it gave me the following result.
+-----------+-------------+--------+
| subject | student | lowest |
+-----------+-------------+--------+
| Chemistry | Mohd Shibli | 86 |
| Maths | Mohd Shibli | 96 |
| Physics | Mohd Shibli | 79 |
+-----------+-------------+--------+
The result obtained is not correct as in the 'student' column it is fetching the name of the first student-only instead of fetching the correct name.
Can anyone help me out for writing the query which can give the results about the max marks obtained in each subject and which student obtained it.
You can try below using correlated subquery
select sub_name as subject, s_name as student,m.marks
from marks m
inner join subjects as sub on m.sub_id=sub.sub_id
inner join students as s m.s_id=s.s_id
WHERE m.marks =
(SELECT MAX(mm.marks) FROM marks mm GROUP BY
mm.sub_id HAVING mm.sub_id=m.sub_id)
class_table
+----+-------+--------------+
| id |teac_id| student_id |
+----+-------+--------------+
| 1 | 1 | 1,2,3,4 |
+----+-------+--------------+
student_mark
+----+----------+--------+
| id |student_id| marks |
+----+----------+--------+
| 1 | 1 | 12 |
+----+----------+--------+
| 2 | 2 | 80 |
+----+----------+--------+
| 3 | 3 | 20 |
+----+----------+--------+
I have these two tables and i want to calculate the total marks of student and my sql is:
SELECT SUM(`marks`)
FROM `student_mark`
WHERE `student_id` IN
(SELECT `student_id` FROM `class_table` WHERE `teac_id` = '1')
But this will return null, please help!!
DB fiddle
Firstly, you should never store comma separated data in your column. You should really normalize your data. So basically, you could have a many-to-many table mapping teacher_to_student, which will have teac_id and student_id columns.
In this particular case, you can utilize Find_in_set() function.
From your current query, it seems that you are trying to getting total marks for a teacher (summing up marks of all his/her students).
Try:
SELECT SUM(sm.`marks`)
FROM `student_mark` AS sm
JOIN `class_table` AS ct
ON FIND_IN_SET(sm.`student_id`, ct.`student_id`) > 0
WHERE ct.`teac_id` = '1'
In case, you want to get total marks per student, you would need to add a Group By. The query would look like:
SELECT sm.`student_id`,
SUM(sm.`marks`)
FROM `student_mark` AS sm
JOIN `class_table` AS ct
ON FIND_IN_SET(sm.`student_id`, ct.`student_id`) > 0
WHERE ct.`teac_id` = '1'
GROUP BY sm.`student_id`
Just in case you want to know why, The reason it returned null is because the subquery returned as '1,2,3,4' as a whole. What you need is to make it returned 1,2,3,4 separately.
What your query returned
SELECT SUM(`marks`)
FROM `student_mark`
WHERE `student_id` IN ('1,2,3,4')
What you expect is
SELECT SUM(`marks`)
FROM `student_mark`
WHERE `student_id` IN (1,2,3,4)
The best way is it normalize as #madhur said. In your case you need to make the teacher and student as one to many link
+----+-------+--------------+
| id |teac_id| student_id |
+----+-------+--------------+
| 1 | 1 | 1 |
+----+-------+--------------+
| 2 | 1 | 2 |
+----+-------+--------------+
| 3 | 1 | 3 |
+----+-------+--------------+
| 4 | 1 | 4 |
+----+-------+--------------+
If you want to filter your table based on a comma separated list with ID, my approach is to
append extra commas at the beginning and at the end of a list as well as at the beginning and at the end of an ID, eg.
1 becomes ,1, and list would become ,1,2,3,4,. The reason for that is to avoid ambigious matches like 1 matches 21 or 12 in a list.
Also, EXISTS is well-suited in that situation, which together with INSTR function should work:
SELECT SUM(`marks`)
FROM `student_mark` sm
WHERE EXISTS(SELECT 1 FROM `class_table`
WHERE `teac_id` = '1' AND
INSTR(CONCAT(',', student_id, ','), CONCAT(',', sm.student_id, ',')) > 0)
Demo
BUT you shouldn't store related IDs in one cell as comma separated list - it should be foreign key column to form proper relation. Joins would become trivial then.
Fiddle here: http://sqlfiddle.com/#!9/53d3c/2/0
I have two tables, one containing Member Names and their ID Number. Let's call that table Names:
CREATE TABLE Names (
ID int,
Title text
);
INSERT INTO Names
VALUES (11,'Chad'),
(10,'Deb'),
(34,'Steph'),
(13,'Chris'),
(98,'Peter'),
(33,'Daniel'),
(78,'Christine'),
(53,'Yolanda')
;
My second table contains meeting information, where someone is a Coach and someone is a Player. Each entry is a separate line (i.e. Meeting_ID 1 has two entries, one for the coach, one for the participant). Further, there is a column identifier for if that row is for a coach or player.
CREATE TABLE Meeting_Data (
Meeting_ID int,
Player_ID int,
Coach_ID int,
field_id int
);
INSERT INTO Meeting_Data
VALUES (1,0,11,2),
(1,10,0,1),
(2,34,0,1),
(2,0,13,2),
(3,98,0,1),
(3,0,33,2),
(4,78,0,1),
(4,0,53,2)
;
What I'm trying to do is create a table that puts each Meeting on one row, and then puts the ID#s and Names of the people meeting. When I attempt this, I get one column to pull successfully and then one column of (null) values.
SELECT Meeting_ID,
Max(CASE
WHEN field_id = 1 THEN Player_ID
END) AS Player_ID,
Max(CASE
WHEN field_id = 2 THEN Coach_ID
END) AS Coach_ID,
Player_Names.Title as Player_Names,
Coach_Names.Title as Coach_Names
FROM Meeting_Data
LEFT JOIN Names Player_Names
ON Player_ID = Player_Names.ID
LEFT JOIN Names Coach_Names
ON Coach_ID = Coach_Names.ID
GROUP BY Meeting_ID
Which results in:
| Meeting_ID | Player_ID | Coach_ID | Player_Names | Coach_Names |
|------------|-----------|----------|--------------|-------------|
| 1 | 10 | 11 | Deb | (null) |
| 2 | 34 | 13 | Steph | (null) |
| 3 | 98 | 33 | Peter | (null) |
| 4 | 78 | 53 | Christine | (null) |
How about something like this (http://sqlfiddle.com/#!9/53d3c/52/0):
SELECT Meeting_ID, Player_ID, Coach_ID, Players.Title, Coaches.Title
FROM (
SELECT Meeting_ID,
MAX(Player_ID) as Player_ID,
MAX(Coach_ID) as Coach_ID
FROM Meeting_Data
GROUP BY Meeting_ID
) meeting
LEFT JOIN Names Players ON Players.ID = meeting.Player_ID
LEFT JOIN Names Coaches ON Coaches.ID = meeting.Coach_ID
Hi these two SQL Queries return the same result
SELECT DISTINCT ItemID
FROM Sale INNER JOIN Department
ON Department.DepartmentID = Sale.DepartmentID
WHERE DepartmentFloor = 2
ORDER BY ItemID
SELECT DISTINCT ItemID
FROM Sale
WHERE EXISTS
(SELECT *
FROM Department
WHERE Sale.DepartmentID = Department.DepartmentID
AND DepartmentFloor = 2)
ORDER BY ItemID;
The Subquery Inside the Exists returns True So why doesnt the secod query return the equivalent of
SELECT DISTINCT ItemID
FROM Sale
Which guves a different result from the two above.
You are getting confused by EXISTS().. It occurs on a line by line basis, based on table correlation, not just a single true/false. This line of your subquery is your correlation clause:
Sale.DepartmentID = Department.DepartmentID
It is saying "Only show the Sale.ItemIDs where that ItemID's Sale.DepartmentID is in Department."
It achieves the same function as a join predicate, like in your first query:
FROM Sale S
JOIN Department D on S.DepartmentID = D.DepartmentID --here
Conversely, this query:
SELECT DISTINCT ItemID
FROM Sale
Has no limiting factor.
As an aside, you also further limit the results of each query with:
WHERE DepartmentFloor = 2
But I don't think that is the part that is throwing you off, I think it is the concept that a correlated subquery occurs for each record. If you were to remove your correlating clause, then the subquery would actually return true always, and you would get all results back.
The subquery isn't always returning true. It will evaluate for each row, joining on DepartmentID where the DepartmentFloor is 2.
SQL Fiddle
MySQL 5.6 Schema Setup:
CREATE TABLE Sale ( ItemID int, DepartmentID int ) ;
INSERT INTO Sale ( ItemID, DepartmentID )
VALUES (1,1), (2,2), (3,3), (4,1), (5,4), (6,2), (7,3), (8,4) ;
CREATE TABLE Department ( DepartmentID int, DepartmentFloor int ) ;
INSERT INTO Department ( DepartmentID, DepartmentFloor )
VALUES (1,1), (2,1), (3,2), (4,2) ;
Query 1:
SELECT *
FROM Department
WHERE DepartmentFloor = 2
Results: This lists only the Departments on DepartmentFloor 2.
| DepartmentID | DepartmentFloor |
|--------------|-----------------|
| 3 | 2 |
| 4 | 2 |
Query 2:
SELECT *
FROM Sale
Results: This lists ALL of your Sales.
| ItemID | DepartmentID |
|--------|--------------|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 1 |
| 5 | 4 |
| 6 | 2 |
| 7 | 3 |
| 8 | 4 |
Query 3:
SELECT *
FROM Sale
WHERE DepartmentID IN (3,4)
Results: And this one shows what is the equivalent of you EXISTS statement. It only shows 4 rows that will match up in my data. So you'd only get back ItemIDs 3,5,7 and 8.
| ItemID | DepartmentID |
|--------|--------------|
| 3 | 3 |
| 5 | 4 |
| 7 | 3 |
| 8 | 4 |
because the uppper part of the query is equivalent to
SELECT DISTINCT ItemID FROM Sale where EXISTS (true)
the upper is the only query that really check the condition ..
Using MySQL 5.7 on Google Cloud, I'm trying to deduplicate MySQL data based on an "EmailAddress" column, but some of the rows have a value in the "FullName" column and some of them don't. I want to keep the ones that have a value in the FullName column, but if none of the rows with that EmailAddress value a FullName value, then just keep the duplicate with the lowest ID number (first column - primary key).
I've finally broken it down into two separate queries, one to first remove the rows with no value in the FullName column IF there's another duplicate row that does have a value in the FullName column:
DELETE
FROM customer_info
WHERE id IN
(
SELECT *
FROM
(
SELECT c1.id
FROM customer_info c1
INNER JOIN customer_info c2 on c1.EmailAddress=c2.EmailAddress and c1.id!=c2.id
WHERE
(trim(c1.FullName)='' or c1.FullName is NULL)
and c2.FullName is not NULL
and length(trim(c2.FullName))!=0
) t
)
and another query to remove the rows with the bigger IDs where no value was found in the FullName column:
DELETE
FROM customer_info
WHERE id IN
(
SELECT *
FROM
(
SELECT c1.id
FROM customer_info c1
INNER JOIN customer_info c2 on c1.EmailAddress=c2.EmailAddress and c1.id>c2.id
) t
)
This "works", but not really. It worked one time when I left it running overnight for a smaller segment of the data, and when I woke up there was an error, but I looked at the data and it was complete.
Am I missing something in my query that's making it highly inefficient, or is it just par for the course for this type of query, and there's no optimization possible in my code that would make a tangible improvement? I've maxed out a Google Cloud SQL instance to their db-n1-highmem-32 size, with 32 GB of memory and 1000 GB of storage space, and it still chokes up and spits out a 2013 error after running for an hour. I need to do this for a total of a little over 3 million rows.
For example, this:
id | FullName | EmailAddress |
----------------------------------------------
1 | John Doe | john.doe#email.com |
2 | null | janedoe#box.com |
3 | null | billybob#bobby.com |
4 | null | john.doe#email.com |
5 | John Lennon | jlennon#yoohoo.com |
6 | null | james.smith#coolmail.com|
7 | null | billybob#bobby.com |
8 | Jane Doe | janedoe#box.com |
would result in this:
id | FullName | EmailAddress |
----------------------------------------------
1 | John Doe | john.doe#email.com |
3 | null | billybob#bobby.com |
5 | John Lennon | jlennon#yoohoo.com |
6 | null | james.smith#coolmail.com|
8 | Jane Doe | janedoe#box.com |
using exists() might be simpler in this situation
delete
from customer_info c
where (trim(c.FullName)='' or c.FullName is null)
and exists (
select 1
from customer_info i
where i.Email = c.EmailAddress
and trim(i.FullName)>''
)
delete
from customer_info c
where exists (
select 1
from customer_info i
where i.Email = c.EmailAddress
and i.id < c.id
)