Very Basic Beginner SQL script - mysql

I need help writing some very basic SQL code. I do not need an eleborate code, the very basics of SQL. I will write my database first, then have the question under.
Okay so I have the four following tables:
Division(did, dname, managerID)
Employee(empid, name, salary, did)
Project(pid, pname, budget, did)
Workon(pid, empid, hours)
The text is bold is the primary key and the text in italic is the foreign key. The Workon table connects both the employee and project tables.
Here are my questions:
List the name of divisions that have/sponsor project(s) employee 'chen' works on.
This is what I have...
select pname, d.dname
from project p, division d
where pid in
(select pid
from employee e, workon w
where e.empid=w.empid and lower (name) 'Chen')
Now I know this is not completely right because I think I have to take it from the Workon table, but I am unsure how to.
List the name of the employee that has the lowest salary in his division and list the total number of projects this employee is work on (use correlated subquery).
This is what I have
select name, did, min(salary) as "lowest salary"
from employee
group by name, did
order by did
and I also have this code..
select did, min(salary)
from employee
group by did
order by did
I am confused because the first code is not only giving me the lowest salary of the division. (If you look at the second code I wrote it shows the average.)
List the name of employee who do more projects than his/her divisional colleagues (correlated subquery).
I have...
select pname
from project p, workon w
where p.pid= w.pid
group by pname
having count (empid) >2
I have a feeling this shouldn't be more than 2. It should be more than the colleague, but I can not figure out how to write that.
List the name of project that some employee(s) who is/are working on it make less than divisional average salary.
select dname
from division d, employee e
where d.did= e.did
group by dname
having avg(salary)>(select avg(salary) from employee)
I do not think this is right either, but I know I am almost there

Search for TDQD on SO to see how I go about Test-Driven Query Design. You can test each step of the query as it is built up, and if something goes wrong with a step, take appropriate corrective action to make the query correct.
I'll tackle the first of your queries:
List the names of divisions that sponsor projects that employee 'chen' works on.
Given your attempted query, I am going to assume that your DBMS supports a function LOWER(x) that case-converts all the letters in its argument to lower-case, leaving any non-letters alone.
Taking this one step at a time, we might come up with:
1. Employee ID for employee 'chen'
SELECT EmpID
FROM Employee
WHERE LOWER(Name) = 'chen'
2. Project IDs of projects that employee 'chen' works on
SELECT W.PID
FROM WorkOn AS W
JOIN Employee AS E
ON E.EmpID = W.EmpID
WHERE LOWER(E.Name) = 'chen'
3. Division IDs of divisions sponsoring projects that employee 'chen' works on
SELECT P.DID
FROM Project AS P
JOIN WorkOn AS W ON W.PID = P.PID
JOIN Employee AS E ON E.EmpID = W.EmpID
WHERE LOWER(E.Name) = 'chen'
4. Names of divisions sponsoring projects that employee 'chen' works on
SELECT DISTINCT D.Dname
FROM Division AS D
JOIN Project AS P ON P.DID = D.DID
JOIN WorkOn AS W ON W.PID = P.PID
JOIN Employee AS E ON E.EmpID = W.EmpID
WHERE LOWER(E.Name) = 'chen'
The DISTINCT removes any repeats if 'chen' works on several projects sponsored by the same division.
You need to apply a similar technique to the other queries in your question. I'm not sure that you need any correlated subqueries. You will probably need 'subqueries in the FROM clause', but you shouldn't need correlated subqueries — IMO, of course. If I spent long enough on it, I could probably come up with a correlated subquery, but such queries are unlikely to perform as well as straight-forward queries and will probably be a lot harder to read (and write). You'll need to assess whether your tutor will accept correct answers that do not use correlated subqueries (but that do produce the correct answer).

Related

Having trouble a query and specifically with joins

The code below is completely wrong and does not work at all. Im basically trying to look through my tables and compile a list of DeptName and the total student number for a department where a department has more than 40 students.
Im confused about joins in general and if someone could explain and show where im going wrong. im sure there is also other problems so any help with them would help
So basically one department is connected to one module, and a student is enrolled in a module. A student cannot take a module outside of their department. So each student should have one module that connects to one department
All of the ID fields in other tables are foreign keys as you can guess and changing the tables is not what I want to do here I just want to do this query as this stands
Relevant tables columns
Table Department DeptID, DeptName, Faculty, Address
Table Modules ModuleID, ModuleName, DeptID, Programme
Table Students StudentID,StudentName,DoB,Address,StudyType,`
Table Enrolments EID,StudentID,ModuleID,Semester,Year
SELECT Department.DeptName, COUNT(Student.StudentID) AS 'No of Students' FROM Department LEFT JOIN Module ON Department.DeptID= Module.DeptID LEFT JOIN Enrolment ON Module.ModuleID= Enrolment.StudentID LEFT JOIN Student.StudentID
GROUP BY(Department.DeptID)
HAVING COUNT(Student.StudentID)>=40
I have not included every table here as there are quite a lot.
But unless i've got this completely wrong you don't need to access a ModuleID in a staff table for the module they teach or something not relevant to this at all. As no student or Dept details are in there.
If that is the case i will fix it very quickly.
SELECT Department.DeptName, COUNT(Student.StudentID) AS 'No of Students'
FROM Department
LEFT JOIN Module
ON Department.DeptID= Module.DeptID
LEFT JOIN Enrolment
-- problem #1:
ON Module.ModuleID= Enrolment.StudentID
-- problem #2:
LEFT JOIN Student.StudentID
-- problem #3:
GROUP BY(Department.DeptID)
HAVING COUNT(Student.StudentID)>=40
You're joining these two tables using the wrong field. Generally when the modeling is done correctly, you should use USING instead of ON for joins
The right side of any JOIN operator has to be a table, not a column.
You have to group by every column in the select clause that is not part of an aggregate function like COUNT. I recommend that you select the DeptID instead of the name, then use the result of this query to look up the name in a subsequent select.
Note : Following code is untested.
WITH bigDepts AS (
SELECT DeptId, COUNT(StudentID) AS StudentCount
FROM Department
JOIN Module
USING ( DeptID )
JOIN Enrolment
USING ( ModuleID )
JOIN Student
USING ( StudentID )
GROUP BY DeptID
HAVING COUNT(StudentID)>=40
)
SELECT DeptID, DeptName, StudentCount
FROM Department
JOIN bigDepts
USING ( DeptID )
Instead of left join you need to use inner join since you need to select related rows only from those three tables.
Groupy by and having clause seems fine. Since you need departments with more than 40 students instead of >= please use COUNT(e.StudentID)>40
SELECT d.DeptName, COUNT(e.StudentID) AS 'No of Students' FROM Department d INNER JOIN Module m ON d.DeptID= m.DeptID inner JOIN Enrolment e ON m.ModuleID= e.StudentID LEFT JOIN Student.StudentID
GROUP BY(d.DeptName)
HAVING COUNT(e.StudentID)>40
So your join clause was a bit iffy to students as you wrote it, and presumably these should all be inner joins.
I've reformatted your query using aliases to make it easier to read.
Since you're counting the number of rows per DeptName you can simply do count(*), likewise in your having you are after counts greater than 40 only. Without seeing your schemas and data it's not possible to know if you might have duplicate Students, if that's the case and you want distinct students count can amend to count(distinct s.studentId)
select d.DeptName, Count(*) as 'No of Students'
from Department d
join Module m on m.DeptId=d.DeptId
join Enrolment e on e.StudentId=m.ModuleId
join Students s on s.StudentId=e.studentId
group by(d.DeptName)
having Count(*)>40
Also, looking at your join conditions, is the Enrolement table relevant?
select d.DeptName, Count(*) as 'No of Students'
from Department d
join Module m on m.DeptId=d.DeptId
join Students s on s.StudentId=m.moduleId
group by(d.DeptName)
having Count(*)>40

Understanding Order of ON Clause in Self-Joins (SQL)

I am trying to understand the SQL self-join - especially how the order of the ON clause matters in the query. This is probably a basic question but please bear with me as I'm a beginner in query language.
This is actually a LeetCode Question - #181 where I'm trying to get the employee whose salary is higher than their manager. You can check out the schema through the LeetCode link or the SQL Fiddle example I've provided below.
Question:
Basically I'm trying to understand the difference in output when I run the below two queries:
I changed the order of the ON clause From (ON e.ManagerId = m.Id) to (ON m.ManagerId = e.Id) and I'm getting the inverse answer from the desired output. I thought because it's a self-join, the order wouldn't matter since I'm extracting information from the identical table.
Please let me know what I'm missing and also point to any directions if possible!
Thanks in advance!
1) Correct Query to get Desired Output
Select *
FROM Employee e
INNER JOIN Employee m
ON e.ManagerId = m.Id
WHERE e.Salary > m.Salary
SQL Fiddle Example
2) Incorrect Query
Select *
FROM Employee e
INNER JOIN Employee m
ON m.ManagerId = e.Id
WHERE e.Salary > m.Salary
SQL Fiddle Example
The functionally the order doesn't matter (so, 'ON e.ManagerId = m.Id' is the same as 'ON m.Id = e.ManagerId').
What you are doing here is joining on different columns, which represent different things.
In the incorrect query, you are saying "the managers managerID is the same as the employees ID", which isn't true. Managers (as you've got it in your table) don't have managers themselves.
What you've essentially done is inverse the join. If you were to swap your sign around in you where statement, so WHERE e.Salary > m.Salary to WHERE e.Salary < m.Salary you'd get the same answer as your correct query
In both queries you are joining one employee with another. In the first query, however, you call the subordinate e and the manager m, while in the second you call the manager e and the subordinate m. Let's look at this more closely:
Select *
FROM Employee e
INNER JOIN Employee m
ON e.ManagerId = m.Id
WHERE e.Salary > m.Salary
You are combining an employee (that you call e for short) with their manager (an employee called m here, the ID of which is referenced as the manager ID in the employee record). Then you only keep employee / manager pairs where the employee's salary is greater than the manager's.
Select *
FROM Employee e
INNER JOIN Employee m
ON m.ManagerId = e.Id
WHERE e.Salary > m.Salary
You are combining an employee (that you call e for short) with their subordinate (an employee called m here, the manager ID of which is referencing the employee record). So, the employee that you call e is the other employee's manager. Then you only keep employee (manager) / subordinate pairs where the manager's salary is greater than the subordinate's.
I think you are not realizing that the table alaises refer to the people.
The m copy of the table is the manager, so the variable m.managerId would refer to the manager of the manager. Which is not what you want. So the correct link:
e.ManagerId = m.Id
is linking an employee row's manager to the manager row's ID.
You might want to think of it as only the ids in the Manager_id columns are Managers.
So to get their names you could do:
select name from Employee where id in (select distinct ManagerId from Employee)
distinct is optional. I would tend to do it, if I was debegging the nested select, as it would make sense to only see same ManagerId in there once. Two+ Employees can potentially have the same Manager.
Try running the queries without the where clause, you will see the same results but column order has switched. This is because of the ON clause:
ON e.ManagerId = m.Id
(Employee e to Manager m)
Or longhand join Employee ManagerId, to Manager Id
Joe as the Employee, with Sam as the Manager (ascending hierarchy as you read across columns)
ON m.ManagerId = e.Id
(Manager m to Employee e)
Or longhand join Manager ManagerId, to Employee Id
Sam as the Manager, with Joe as the Employee (descending hierarchy as you read across columns)
Column order not withstanding, if you were to flip the WHERE clause from > to <= when you flipped the ON prefix order you would yield the same results.

Trying to get a row count in a subquery

I have two tables, one is departments and the other is employees. The department id is a foreign key in the employees table. The employee table has a name and a flag saying if the person is part-time. I can have zero or more employees in a department. I'm trying to figure out out to get a list of all departments where a department has at least one employee and if it does have at least one employee, that all the employees are part time. I think this has to be some kind of subquery to get this. Here's what I have so far:
SELECT dept.name
,dept.id
,employee.deptid
,count(employee.is_parttime)
FROM employee
,dept
WHERE dept.id = employee.deptid
AND employee.is_parttime = 1
GROUP BY employee.is_parttime
I would really appreciate any help at this point.
You must join (properly) the tables and group by department with a condition in the HAVING clause:
select d.name, d.id, count(e.id) total
from dept d inner join employee e
on d.id = e.deptid
group by d.name, d.id
having total = sum(e.is_parttime)
The inner join returns only departments with at least 1 employee.
The column is_parttime (I guess) is a flag with values 0 or 1 so by summing it the result is the number of employees that are part time in the department and this number is compared to the total number of employees of the department.
As a preliminary aside, I recommend expressing joins with the JOIN keyword, and segregating join conditions from filter conditions. Doing so would make the original query look like so:
select dept.name, dept.id, employee.deptid, count(employee.is_parttime)
from employee
join dept on dept.id = employee.deptid
where employee.is_parttime = 1
group by employee.is_parttime
It doesn't make much practical difference for inner joins, but it does make the structure of the data and the logic of the query a bit clearer. On the other hand, it does make a difference for outer joins, and there is value in consistency.
As for the actual question, yes, one can rewrite the original query using a subquery or an inline view to produce the requested result. (An "inline view" is technically what one should call an embedded query used as a table in the FROM clause, but some people lump these in with subqueries.)
Example using a subquery
select dept.name, dept.id
from dept
where dept.id in (
select deptid
from employee
group by deptid
having count(*) == sum(is_parttime)
)
Example using an inline view
select dept.name, dept.id
from dept
join (
select deptid
from employee
group by deptid
having count(*) == sum(is_parttime)
) pt_dept
on dept.id = pt_dept.deptid
In each case, the subquery / inline view does most of the work. It aggregates employees by department, then filters the groups (HAVING clause) to select only those in which the part-time employee count is the same as the total count. Naturally, departments without any employees will not be represented. If a list of department IDs would suffice for a list of departments, then that's actually all you need. To get the department names too, however, you need to combine that with data from the dept table, as demonstrated in the two example queries.

SQL iterative loop to see if employee works on all projects

The query I am supposed to form has to accomplish the following task:
Retrieve the names of all employees who work on every project.
I currently have three tables. The Employee, works_on, and project tables. The goal to accomplish this query is to get each project id from the project table, compare it to the project id in the works_on table. When there is a match it will get the SSN and get the names from the employee table. The query I have formed is this:
SELECT e.Fname, e.Minit, e.Lname, p.Pname
FROM EMPLOYEE e, PROJECT p, WORKS_ON w
WHERE p.Pnumber=w.Pno AND w.Essn=e.Ssn
But this outputs All the employees that work on each project not all the employees that work on EVERY project. Is there some way to iterate through a list of results from the query SELECT Pnumber FROM PROJECT?
I really hope I worded this question clearly for your understanding.
Also you don't need PROJECT, WORKS_ON is sufficient.
HAVING filters the results after a GROUP BY.
The GROUP BY e.Ssn means that the COUNT(*) in HAVING is per employee. The JOIN ON WORKS_ON is mapping the user to PROJECT giving the count.
Use JOIN table tbl ON .. = tbl.id JOIN syntax - easier to read.
SELECT e.Fname, e.Minit, e.Lname
FROM EMPLOYEE e
JOIN WORKS_ON w
ON w.Essn=e.Ssn
GROUP BY e.Ssn
HAVING COUNT(*) = (SELECT COUNT(*) FROM PROJECTS)
SELECT e.Fname, e.Minit, e.Lname
FROM EMPLOYEE e
WHERE NOT EXISTS(SELECT PNum
FROM PROJECT
WHERE NOT EXISTS(SELECT *
FROM WORKS_ON
WHERE PNum=PNo AND Essn=e.ssn));
You can select the employee on the condition that:
There doesn't exist a project where the employee doesn't work on it.
You can use the innermost nested query to select tuples where there doesn't exist a WORKS_ON tuple where employee with Ssn works on project with Pnum.
Then use the outermost nested query to select the tuples where the above condition doesn't hold ^^ (so there is an employee with Ssn that works on project with Pnum) for ALL projects.
I hope that makes sense and good luck!

Can you explain these 2 SQL queries to me?

I'm studying for my DB exam which covers a lot of SQL statements I need to write by hand. Below is the schema diagram and solutions for 2 scenarios that were outlined in my book that don't seem to make sense to me.
Q13: Retrieve the names of all employees in department 5 who work more than 10 hours per week on the ProductX project.
SELECT FNAME, LNAME
FROM EMPLOYEE,PROJECT, WORKS_ON
WHERE DNO = 5 AND PNAME = ‘PRODUCT X’ AND HOURS>10 AND ESSN=SSN;
Shouldn't the WHERE clause include PNO = NUMBER ? How would the WORKS_ON table know to reference the PROJECT table without including this? Is it because we reference the ESSN = SSN?
Q1: Retrieve the name of each employee who has a dependent with the same first name and is the same sex as the employee.
SELECT E.FNAME, E.LNAME
FROM EMPLOYEE AS E
WHERE E.SSN IN (SELECT D.ESSN FROM DEPENDENT AS D WHERE E.FNAME = D.DEPENDENT_NAME AND D.SEX = E.SEX);
I understand this query all the way up until the WHERE clause. I don't understand what E.SSN IN is trying to do with the sub query ahead of it. If someone can explain this, that would be great.
For the first question, yes you guessed it right. There should be another clause as PNO = NUMBER.
For second question, think of it this way: Select an employee where employee number[Ssn] is in the list of employeeIDs [Essn] returned by sub-query for each given employee number [Ssn]. This should work fine. But, because Essn and Dependant Name are both keys for the Dependent table, you can also use simple join statements and get it done. Read about it here: http://www.w3schools.com/sql/sql_join.asp
For Q13: You need to include one more condition in WHERE clause that tells the relation between Works_on and Project, which is
SELECT FNAME, LNAME
FROM EMPLOYEE,PROJECT, WORKS_ON
WHERE Pno = Pnumber AND DNO = 5 AND PNAME = ‘PRODUCT X’ AND HOURS>10 AND ESSN=SSN;
Q1: uses correlated sub-query.
SELECT E.FNAME, E.LNAME
FROM EMPLOYEE E
INNER JOIN WORKS_ON WO
ON WO.Essn = E.Ssn
INNER JOIN PROJECT P
ON P.Pnumber = WO.Pno
where E.DNO = 5
and P.name = 'ProductX'
and WO.Hours > 10