I am unsure: Is this an anti-join? - mysql

I am working on the first problem of the famous SQLzoos and am working on the using Null section: http://sqlzoo.net/wiki/Using_Null
The question is:
List the teachers who have NULL for their department.
The corresponding SQL query would be:
SELECT t.name
FROM teacher t
WHERE t.dept IS NULL
Is this a type of anti-join? Specifically, is this a left-anti-join?

This isn't a join at all.
The statement is filtering only records for teachers who don't have an assigned department.
Set Difference
The set difference of teachers and departments, teacher \ department would be a kind of "anti-join"
SELECT
t.name
FROM teacher t
LEFT JOIN department d ON d.id = t.dept_id
WHERE d.id IS NULL
At first glance, this statement does what your statement does, if the foreign key reference was enforced, it would guarantee to do exactly that. However, one use for this statement would be to retrieve teachers who are assigned to departments that have since been deleted (e.g. if the English Lit Dept. & English as 2nd Lang Dept. were reorganized as the English Dept.)
Symmetric Difference
Another "anti-join" would be the symmetric difference, which selects elements from both sets ONLY if they cannot be joined, i.e
(teacher \ department) U (department \ teacher)
I can't think of a motivating example using teachers and departments, but one way to write the symmetric difference on databases that support the FULL OUTER JOIN would be:
SELECT
t.name
FROM teacher t
FULL OUTER JOIN department d ON d.id = t.dept_id
WHERE d.id IS NULL OR t.id IS NULL
For MySQL, this statement would have to be written as the union of two statements.
SELECT
t.name teacher_name, d.name department_name
FROM teacher t
LEFT JOIN department d ON d.id = t.dept_id
WHERE d.id IS NULL
UNION ALL
SELECT
t.name teacher_name, d.name department_name
FROM teacher t
LEFT JOIN department d ON d.id = t.dept_id
WHERE t.id IS NULL
Looking through one of my projects, I found this one use of symmetric difference:
Context:
I have three tables: users, users_gameplay_summary, users_transactions_summary. I needed to email those users who created their accounts in the past 7 days AND one of the following
have transacted but have not played or played but have not transacted.
To get the list, I have this query (note, this was written for Postgresql, and won't work on MySQL, but it illustrates the symmetric difference use case):
SELECT
COALESCE(g.user_id, t.user_id) user_id
FROM users_gameplay_summary g
FULL OUTER JOIN users_transactions_summary t ON t.user_id = g.user_id
WHERE COALESCE(g.user_id, t.user_id) IN (
SELECT user_id
FROM users
WHERE created_at > CURRENT_DATE - '7 day'::interval)
AND (g.user_id IS NULL OR t.user_id IS NULL)

Not exactly, your not actually joining anything now,
in the case of a left anti join you would have access to the department name as well. (although it would be NULL)
Your sql code would be a correct answer for the question you gave though.
A left anti join would be:
SELECT t.name
FROM teacher t
LEFT JOIN dept d ON d.id = t.dept
WHERE d.id IS NULL

To solve this problem of listing teachers without assigned departments, you don't need a JOIN between teacher and dept tables.
dept table is basically a dictionary table that you join to, to translate ids to corresponding names.
teacher table has a dept column which normally could have a FOREIGN KEY constraint to id column in dept table.
Your query is not an ANTI-JOIN. This is a simple projection and selection query using one table.
SELECT t.name
FROM teacher t
WHERE t.dept IS NULL
For an ANTI-JOIN you would at least need a JOIN operation between more than one table at first.
Normally an ANTI-JOIN could look like:
Using LEFT JOIN
SELECT *
FROM table1 t1
LEFT JOIN table2 t2
ON t1.join_column = t2.join_column
WHERE t2.join_column IS NULL
Using NOT EXISTS
SELECT *
FROM table1 t1
WHERE NOT EXISTS (
SELECT 1
FROM table2 t2
WHERE t1.join_column = t2.join_column
)

Related

Having trouble a query and specifically with joins

The code below is completely wrong and does not work at all. Im basically trying to look through my tables and compile a list of DeptName and the total student number for a department where a department has more than 40 students.
Im confused about joins in general and if someone could explain and show where im going wrong. im sure there is also other problems so any help with them would help
So basically one department is connected to one module, and a student is enrolled in a module. A student cannot take a module outside of their department. So each student should have one module that connects to one department
All of the ID fields in other tables are foreign keys as you can guess and changing the tables is not what I want to do here I just want to do this query as this stands
Relevant tables columns
Table Department DeptID, DeptName, Faculty, Address
Table Modules ModuleID, ModuleName, DeptID, Programme
Table Students StudentID,StudentName,DoB,Address,StudyType,`
Table Enrolments EID,StudentID,ModuleID,Semester,Year
SELECT Department.DeptName, COUNT(Student.StudentID) AS 'No of Students' FROM Department LEFT JOIN Module ON Department.DeptID= Module.DeptID LEFT JOIN Enrolment ON Module.ModuleID= Enrolment.StudentID LEFT JOIN Student.StudentID
GROUP BY(Department.DeptID)
HAVING COUNT(Student.StudentID)>=40
I have not included every table here as there are quite a lot.
But unless i've got this completely wrong you don't need to access a ModuleID in a staff table for the module they teach or something not relevant to this at all. As no student or Dept details are in there.
If that is the case i will fix it very quickly.
SELECT Department.DeptName, COUNT(Student.StudentID) AS 'No of Students'
FROM Department
LEFT JOIN Module
ON Department.DeptID= Module.DeptID
LEFT JOIN Enrolment
-- problem #1:
ON Module.ModuleID= Enrolment.StudentID
-- problem #2:
LEFT JOIN Student.StudentID
-- problem #3:
GROUP BY(Department.DeptID)
HAVING COUNT(Student.StudentID)>=40
You're joining these two tables using the wrong field. Generally when the modeling is done correctly, you should use USING instead of ON for joins
The right side of any JOIN operator has to be a table, not a column.
You have to group by every column in the select clause that is not part of an aggregate function like COUNT. I recommend that you select the DeptID instead of the name, then use the result of this query to look up the name in a subsequent select.
Note : Following code is untested.
WITH bigDepts AS (
SELECT DeptId, COUNT(StudentID) AS StudentCount
FROM Department
JOIN Module
USING ( DeptID )
JOIN Enrolment
USING ( ModuleID )
JOIN Student
USING ( StudentID )
GROUP BY DeptID
HAVING COUNT(StudentID)>=40
)
SELECT DeptID, DeptName, StudentCount
FROM Department
JOIN bigDepts
USING ( DeptID )
Instead of left join you need to use inner join since you need to select related rows only from those three tables.
Groupy by and having clause seems fine. Since you need departments with more than 40 students instead of >= please use COUNT(e.StudentID)>40
SELECT d.DeptName, COUNT(e.StudentID) AS 'No of Students' FROM Department d INNER JOIN Module m ON d.DeptID= m.DeptID inner JOIN Enrolment e ON m.ModuleID= e.StudentID LEFT JOIN Student.StudentID
GROUP BY(d.DeptName)
HAVING COUNT(e.StudentID)>40
So your join clause was a bit iffy to students as you wrote it, and presumably these should all be inner joins.
I've reformatted your query using aliases to make it easier to read.
Since you're counting the number of rows per DeptName you can simply do count(*), likewise in your having you are after counts greater than 40 only. Without seeing your schemas and data it's not possible to know if you might have duplicate Students, if that's the case and you want distinct students count can amend to count(distinct s.studentId)
select d.DeptName, Count(*) as 'No of Students'
from Department d
join Module m on m.DeptId=d.DeptId
join Enrolment e on e.StudentId=m.ModuleId
join Students s on s.StudentId=e.studentId
group by(d.DeptName)
having Count(*)>40
Also, looking at your join conditions, is the Enrolement table relevant?
select d.DeptName, Count(*) as 'No of Students'
from Department d
join Module m on m.DeptId=d.DeptId
join Students s on s.StudentId=m.moduleId
group by(d.DeptName)
having Count(*)>40

How to check if a particular mapping exists in a one-to-many mapping table

I am having a table that maintains the mapping of an EMPLOYEE_ID to the one or more ROLE_IDs that the employee can be assigned with. The ROLE_ID is a primary key of the ROLE table.
Now, I am trying to find if a particular employee is a Team Leader (ROLE_ID = 2) or not. That is, in essence, trying to find if the particular mapping combination of (EMPLOYEE_ID, 2) exists in the mapping table.
Currently, I am using the below query to achieve this:
SELECT E.NAME AS `EMPLOYEE_NAME`,
EXISTS( SELECT 1 FROM `EMPLOYEE_ROLE` WHERE
(`EMPLOYEE_ROLE`.`EMPLOYEE_ID` = `E`.`EMPLOYEE_ID`)
AND (`EMPLOYEE_ROLE`.`ROLE_ID` = 2)) AS `IS_TEAM_LEADER`
-- Assume some other column shall be selected from ER table,
-- hence necessitating the JOIN on ER
FROM EMPLOYEE E
JOIN EMPLOYEE_ROLE ER ON (ER.EMPLOYEE_ID = E.EMPLOYEE_ID)
GROUP BY E.EMPLOYEE_ID;
Although this seems to get the job done, I am looking for a more efficient approach, as the subquery in its current form seems redundant. Not sure if it's relevant, but can FIND_IN_SET or some such function be used?
Can anyone suggest a solution, as I am interested in the best-performing approach?
EDIT 1: I have intentionally used the JOIN EMPLOYEE_ROLE with the intention that some other column also may be picked from the ER table. So, I am looking for optimising the subquery, while keeping that join intact. Hence, the statement "current subquery in its current form seems redundant".
SQLFiddle: http://sqlfiddle.com/#!9/2aad3/5
Either use the exists subquery or use join, but you should not use both in one query.
I would use the join approach, since it's easy to get role related data if necessary:
SELECT E.NAME AS `EMPLOYEE_NAME`,
FROM EMPLOYEE E
INNER JOIN EMPLOYEE_ROLE ER ON (ER.EMPLOYEE_ID = E.EMPLOYEE_ID)
WHERE ER.ROLE_ID=2;
If you need a list of all employees with a field indicating if that employee is IS leader or not, then use left join instead of inner:
SELECT DISTINCT E.NAME AS `EMPLOYEE_NAME`, IF(ER.ROLE_ID IS NULL, 'NOT IS Leader','IS Leader') AS IsISLeader
FROM EMPLOYEE E
LEFT JOIN EMPLOYEE_ROLE ER ON ER.EMPLOYEE_ID = E.EMPLOYEE_ID AND ER.ROLE_ID=2;
"best performing" --
CREATE TABLE `EMPLOYEE_ROLE` (
`EMPLOYEE_ID` INT NOT NULL,
`ROLE_ID` INT NOT NULL,
PRIMARY KEY(`EMPLOYEE_ID`, ROLE_ID),
INDEX(`ROLE_ID` EMPLOYEE_ID)
) ENGINE=InnoDB;
Why.
Beyond that, see #Shadow's answer.
I include EMPLOYEE_ID in the SQL in case same NAME found for different EMPLOYEE_ID, also use LEFT JOIN in case some employee does not have roles:
SELECT E.EMPLOYEE_ID, E.NAME, SUM(IF(R.ROLE_ID=2,1,0)) IS_TEAM_LEADER
FROM EMPLOYEE E
LEFT JOIN EMPLOYEE_ROLE R ON E.EMPLOYEE_ID = R.EMPLOYEE_ID
GROUP BY 1,2
Simply use The FK relationship
create a Role Table as
table role{
RoleID
RoleName
Status
...}
and edit the employee table as
table employee{
ID
name
...
}
create table employeerole
table employeerole{
ID
roleID
employeeID
}
now you can manage multiple roles to each employee with normalization rules too
now you can use the query
SELECT employee.ID, employee.Name, employeerole.roleid FROM role
INNER JOIN employeerole ON role.RoleID = employeerole.roleid INNER
JOIN dbo.employee ON dbo.employeerole.employeeid = dbo.employee.ID
WHERE employeerole.roleid = 2
the table role has ID 2 and Name for example is team leader or manager
now this query is fully optimized and will give you best results
If I understand you correctly, you're trying to get all employees, with all of their roles, plus an additional column indicating whether they're a Team Leader. If so, you just need to join on EMPLOYEE_ROLE twice, the second being a LEFT JOIN just to check for a specific role:
SELECT E.NAME, ER.ROLE_ID, ER2.EMPLOYEE_ID IS NOT NULL AS `IS_TEAM_LEADER`
FROM EMPLOYEE E
JOIN EMPLOYEE_ROLE ER ON ER.EMPLOYEE_ID = E.EMPLOYEE_ID
LEFT JOIN EMPLOYEE_ROLE ER2 ON ER2.EMPLOYEE_ID = E.EMPLOYEE_ID AND ER2.ROLE_ID = 2;
SQL Fiddle: http://sqlfiddle.com/#!9/2aad3/9/0
Here is the query to get all the employee who belongs to a specific role
SELECT e.NAME FROM EMPLOYEE e right join (SELECT EMPLOYEE_ID FROM
EMPLOYEE_ROLE WHERE ROLE_ID=2) er using(EMPLOYEE_ID);

How to fetch records using NOT syntax?

I have the next tables: faculty, faculty_subjects and subjects. Every faculty has a list of subjects
How can I SELECT subjects that are not needed for this faculty ?
Note: if user adds some new subject, then this subject doesn't already have a record in faculty_subjects table, but it should be displayed in SELECT.
So to sum up - from the list of ALL subjects that exists I want to know the ones that doesn't needed for this faculty. How this can be achieved ?
I'm using MySQL database.
PS. if there is a more better title - please edit.
You can LEFT JOIN to the Faculty_Subject, with the particulary Faculty ID you are interested in used in the join predicate, then filter out the rows with a match using the WHERE predicate, so you are left with the subjects you want:
SELECT s.id, s.Name
FROM Subject AS s
LEFT JOIN Faculty_Subject AS fs
ON fs.Subject_idSubject = s.ID
AND fs.Faculty_idFaculty = 1
WHERE fs.id IS NULL;
Note, I would usually advise to use NOT EXISTS syntax:
SELECT s.id, s.Name
FROM Subject AS s
WHERE NOT EXISTS
( SELECT 1
FROM Faculty_Subject AS fs
WHERE fs.Subject_idSubject = s.ID
AND fs.Faculty_idFaculty = 1
);
Which I beleive is more logical to the reader, but in MySQL LEFT JOIN/IS NULL performs better than NOT EXISTS
SELECT name FROM Subject WHERE id NOT IN
(SELECT FS.Subject_idSubject
FROM Faculty_Subjects AS FS
INNER JOIN Faculty AS F ON F.id=FS.Faculty_idFaculty
WHERE F.name='Physics' AND FS.Subject_idSubject IS NOT NULL)

Semantics of multiple joins

What happens actually when we use cascaded join statements
select student.name, count(teacher.id)
from student
left join course on student.course_id = course.id
left join teacher on student.teacher_id = teacher.id
group by student.name;
It seems when I used only the first left join alone it returned 30 rows while using the second left join alone returned 20 rows. But using together returns 600 rows. What is actually happening ? Does the result from the first left join is used in the second ? I don't understand the semantics. Help me understand it.
Since you don't have any join conditions between teacher and course, you're getting a full cross-product between each of the other two joins. Since one join returns 20 rows and the other returns 30 rows, the 3-way join returns 20x30 = 600 rows. Its equivalent to:
SELECT t1.name, count(t2.id)
FROM (SELECT student.name
FROM student
LEFT JOIN course ON student.id = course.id) AS t1
CROSS JOIN
(SELECT teacher.id
FROM student
LEFT JOIN teacher ON student.id = teacher.id) AS t2
GROUP BY t1.name
Notice that the CROSS JOIN of the two subqueries has no ON condition.
The correct way to structure this database is as follows:
student table: id (PK), name
course table: id (PK), name, fee, credits
student_course table: id (PK), student_id (FK), course_id (FK), unique key on (student_id, course_id)
Then to get the name of each student and the average course fee, you would do:
SELECT s.name, AVG(c.fee) AS avg_fee
FROM student AS s
LEFT JOIN student_course AS sc ON s.id = sc.student_id
LEFT JOIN course AS c ON sc.course_id = c.id
All Mysql joins are graphically explained here. Take a look and choose correct joins for both joined tables.

Joining three tables in SQL, Inner join does not work properly

We have three tables, document,department and contact.
All the tables are linked by an 'id' column. I want the result as follows
firstname lastname address upload_date department_name
The below query fetches the first four columns
SELECT contact.firstname, contact.lastname,contact.address ,
document.upload_date
FROM contact
JOIN document
ON document.id= contact.id
AND contact.status = 1
AND document.defaultdoc=1
So it's an inner join.
But to fetch the last column, the department_name I added a similar join with contact.deptId=department.id, but the query returns zero results. Anything wrong ?
if you add
JOIN department
ON contact.deptId=department.id
it should work.
If all deptId's exist in its master table department then you can use below query:
SELECT c.firstname, c.lastname,c.address ,
doc.upload_date, d.department_name
FROM contact as c
JOIN document as doc
ON doc.id= c.id
AND c.status = 1
AND doc.defaultdoc=1
JOIN department as d
on c.deptId=d.id;
But if deptid updated in contact table does not exist in department table (which should not be if you want referential integrity in your db), then you can use below query:
SELECT c.firstname, c.lastname,c.address ,
doc.upload_date, d.department_name
FROM contact as c
JOIN document as doc
ON doc.id= c.id
AND c.status = 1
AND doc.defaultdoc=1
LEFT JOIN department as d
on c.deptId=d.id;
If both don't work then show your table data and structure.