I have just been looking at some example database queries, and came across this :
Find the identifier, name & address of employees of the Research department.
Method I: Join Method
SELECT Ssn, FName, LName, Address
FROM EMPLOYEE, DEPARTMENT
WHERE Dno = Dnumber
AND Dname = ‘Research’
Method II: Subquery Method
SELECT Ssn, FName, LName, Address
FROM EMPLOYEE
WHERE Dno IN
( SELECT Dnumber
FROM DEPARTMENT
WHERE Dname = ‘Research’ );
In these examples, why can you not leave out the Dno = Dnumber line? How do you know when to include this?
John, you are using implicit join syntax cf ansi SQL '89.
WHERE JOIN
SELECT Ssn, FName, LName, Address
FROM EMPLOYEE, DEPARTMENT
WHERE Dno = Dnumber
AND Dname = 'Research'
You should never ever use that because it is confusing as hell.
And it causes a lot of errors because it does a cross join if you're not careful.
The following syntax using explicit joins cf ANSI SQL '92 which is much clearer.
SELECT Ssn, FName, LName, Address
FROM EMPLOYEE
inner join DEPARTMENT on (employee.dnumber = department.dno)
WHERE Dname = 'Research'
This also answers why you cannot leave out the dnumber = dno, because that's the join condition
SUBQUERY
A subquery is really a join by other means.
In general you should avoid a subquery because a join is faster (90% of the time)
Some people find subqueries easier to understand. I would say that if you don't grok joins stay away from SQL!
Still sometimes you're doing something to complex or bizarre for a join and then the subquery is an option.
Back to your question: because the subquery is really a join by other means you need that join condition to make the join :-).
I would actually write something like this.
SELECT
Ssn, FName, LName, Address
FROM
EMPLOYEE
LEFT JOIN
DEPARTMENT ON EMPLOYEE.Dno = DEPARTMENT.Dnumber
WHERE
Dname = "Research"
The Dno = Dnumber is used to bridge the JOIN between the Employee and Department tables. You need it to identify the relationship between tables you are joining.
There are many ways to write JOIN statements.
This is a good tutorial about joins - http://mysqljoin.com/
Each employee belongs to a department. Dno = Dnumber is what defines the relationship between the two. So you have to keep that relationship in your join. Dname = 'Research' further filters to only include Research department employees.
Should you not join Dno to Dnumber, you will wind up with a Cartesian product.
The "Dno = Dnumber" line is the join clause - it tells the query only to include records in the employee table whose Dno matches the department number in the department table.
You use the Dno = Dnumber line to basically create JOIN criteria.
If you didn't use them you'd have a full JOIN
However, I would contend that the correct way to do the JOIN is to actually do
SELECT
EMPLOYEE.[Ssn]
, EMPLOYEE.[FName]
, EMPLOYEE.[LName]
, EMPLOYEE.[Address]
FROM
EMPLOYEE
JOIN
DEPARTMENT
ON DEPARTMENT.[Dnumber] = EMPLOYEE.[Dno]
WHERE
DEPARTMENT.[Dname] = 'Research'
The Dno = Dnumber clause is needed for you to join the two tables. Without this, you'll get what's called a Cartesian join, where you'll get n x m number of rows returned, where n = # rows in the EMPLOYEE table, and m = # rows in the DEPARTMENT table.
You can find a good tutorial on SQL join at http://www.1keydata.com/sql/sqljoins.html.
Related
I have two tables, one is departments and the other is employees. The department id is a foreign key in the employees table. The employee table has a name and a flag saying if the person is part-time. I can have zero or more employees in a department. I'm trying to figure out out to get a list of all departments where a department has at least one employee and if it does have at least one employee, that all the employees are part time. I think this has to be some kind of subquery to get this. Here's what I have so far:
SELECT dept.name
,dept.id
,employee.deptid
,count(employee.is_parttime)
FROM employee
,dept
WHERE dept.id = employee.deptid
AND employee.is_parttime = 1
GROUP BY employee.is_parttime
I would really appreciate any help at this point.
You must join (properly) the tables and group by department with a condition in the HAVING clause:
select d.name, d.id, count(e.id) total
from dept d inner join employee e
on d.id = e.deptid
group by d.name, d.id
having total = sum(e.is_parttime)
The inner join returns only departments with at least 1 employee.
The column is_parttime (I guess) is a flag with values 0 or 1 so by summing it the result is the number of employees that are part time in the department and this number is compared to the total number of employees of the department.
As a preliminary aside, I recommend expressing joins with the JOIN keyword, and segregating join conditions from filter conditions. Doing so would make the original query look like so:
select dept.name, dept.id, employee.deptid, count(employee.is_parttime)
from employee
join dept on dept.id = employee.deptid
where employee.is_parttime = 1
group by employee.is_parttime
It doesn't make much practical difference for inner joins, but it does make the structure of the data and the logic of the query a bit clearer. On the other hand, it does make a difference for outer joins, and there is value in consistency.
As for the actual question, yes, one can rewrite the original query using a subquery or an inline view to produce the requested result. (An "inline view" is technically what one should call an embedded query used as a table in the FROM clause, but some people lump these in with subqueries.)
Example using a subquery
select dept.name, dept.id
from dept
where dept.id in (
select deptid
from employee
group by deptid
having count(*) == sum(is_parttime)
)
Example using an inline view
select dept.name, dept.id
from dept
join (
select deptid
from employee
group by deptid
having count(*) == sum(is_parttime)
) pt_dept
on dept.id = pt_dept.deptid
In each case, the subquery / inline view does most of the work. It aggregates employees by department, then filters the groups (HAVING clause) to select only those in which the part-time employee count is the same as the total count. Naturally, departments without any employees will not be represented. If a list of department IDs would suffice for a list of departments, then that's actually all you need. To get the department names too, however, you need to combine that with data from the dept table, as demonstrated in the two example queries.
I've been searching for answers awhile on here, but I'm just ending up going in circles.
anyways, assuming I have two tables:
Employee with the attributes first_name, last_name, middle_initial, and ESSN
and
dependents with the attributes essn, dependent_name and relationship
I want to simply list the names of all department managers who have no dependents.
The code i'm trying to use works if the != is set to =, and gives me the matching results.
What do I to pull up the results that do not match?
select employee.first_name, employee.middle_initial, employee.last_name
from employee join dependent on employee.essn = dependent.essn
where employee.essn != dependent.essn
group by employee.first_name;
i also attempted to use != with the ESSN of the department managers, but I ran into the same problem.
It can be achieved much easier:
select first_name from employee where essn not in (select essn from dependent);
I would use left join where dependent.essn is null
select employee.first_name, employee.middle_initial, employee.last_name
from employee
left join dependent
on employee.essn = dependent.essn
where dependent.essn is null
My bad..just noticed this is mysql.. I assume it would be the same as mssql
I'm studying for my DB exam which covers a lot of SQL statements I need to write by hand. Below is the schema diagram and solutions for 2 scenarios that were outlined in my book that don't seem to make sense to me.
Q13: Retrieve the names of all employees in department 5 who work more than 10 hours per week on the ProductX project.
SELECT FNAME, LNAME
FROM EMPLOYEE,PROJECT, WORKS_ON
WHERE DNO = 5 AND PNAME = ‘PRODUCT X’ AND HOURS>10 AND ESSN=SSN;
Shouldn't the WHERE clause include PNO = NUMBER ? How would the WORKS_ON table know to reference the PROJECT table without including this? Is it because we reference the ESSN = SSN?
Q1: Retrieve the name of each employee who has a dependent with the same first name and is the same sex as the employee.
SELECT E.FNAME, E.LNAME
FROM EMPLOYEE AS E
WHERE E.SSN IN (SELECT D.ESSN FROM DEPENDENT AS D WHERE E.FNAME = D.DEPENDENT_NAME AND D.SEX = E.SEX);
I understand this query all the way up until the WHERE clause. I don't understand what E.SSN IN is trying to do with the sub query ahead of it. If someone can explain this, that would be great.
For the first question, yes you guessed it right. There should be another clause as PNO = NUMBER.
For second question, think of it this way: Select an employee where employee number[Ssn] is in the list of employeeIDs [Essn] returned by sub-query for each given employee number [Ssn]. This should work fine. But, because Essn and Dependant Name are both keys for the Dependent table, you can also use simple join statements and get it done. Read about it here: http://www.w3schools.com/sql/sql_join.asp
For Q13: You need to include one more condition in WHERE clause that tells the relation between Works_on and Project, which is
SELECT FNAME, LNAME
FROM EMPLOYEE,PROJECT, WORKS_ON
WHERE Pno = Pnumber AND DNO = 5 AND PNAME = ‘PRODUCT X’ AND HOURS>10 AND ESSN=SSN;
Q1: uses correlated sub-query.
SELECT E.FNAME, E.LNAME
FROM EMPLOYEE E
INNER JOIN WORKS_ON WO
ON WO.Essn = E.Ssn
INNER JOIN PROJECT P
ON P.Pnumber = WO.Pno
where E.DNO = 5
and P.name = 'ProductX'
and WO.Hours > 10
So I have 3 tables that I want to perform some queries on but during that process I end up returning back to a table I already performed some functions on and it.
I first get the essn from one dependent table and search the employee table for a ssn matching it then i got the superssn and compare to the mgrssn in another table. The last step is to go back into the employee table and find the name of the person who has the same ssn of mgrssn.
The issue here is that once i get the matching superssn I can't access the other rows.
select lname, fname from
(select mgrssn from department) as d,
(select superssn, lname,fname,ssn from
(select essn from dependent where dependent_name ='joy') as de,
(select ssn,lname,fname,superssn from employee) as e
where essn =ssn) as s
where s.ssn = mgrssn
Should I look into doing joins instead?
You don't need the subqueries. Your query is a bit hard to follow (you don't have table aliases for all the columns), but I think this is what you are trying to do:
select lname, fname
from department d join
employee e
on e.ssn = d.mgrssn join
dependent dep
on dep.essn = e.ssn
where dep.dependent_name ='joy';
Simple rule: Never use commas in the from clause. Always use explicit join syntax.
Lets say I have the following database model:
And the question is as follows:
List ALL department names and the total number of employees in the department. The total number of employees column should be renamed as "total_emps". Order the list from the department with the least number of employees to the most number of employees. Note: You need to include a department in the list even when the department does not currently have any employee assigned to it.
This was my attempt:
SELECT Department.deptname
(SELECT COUNT(*)
FROM Department
WHERE Department.empno = Employee.empno ) AS total_emps
FROM Department
I'm pretty sure my solution is not correct as it won't include departments with no employees. How do you use a left inner join to solve this problem?
The query as you were trying to write it is:
(table creates modified from shree.pat18's sqlfiddle to this sqlfiddle)
create table department (deptno int, deptname varchar(20));
insert into department values (1, 'a'),(2, 'b'),(3, 'c');
create table employee (empno int, deptno int);
insert into employee values (1,1),(2,1),(3,3);
SELECT d.deptname,
(SELECT COUNT(*)
FROM EMPLOYEE e
WHERE d.deptno = e.deptno ) AS total_emps
FROM DEPARTMENT d
ORDER BY total_emps ASC;
(You were counting from DEPARTMENT instead of EMPLOYEE and comparing empno instead of deptno. And you left out a comma.)
(You were asked for every department's name and employee count so this returns that. In practice we would include a presumably unique deptno if deptname was not unique.)
I'm pretty sure my solution is not correct as it won't include
departments with no employees.
Even your answer's version of the query (with the missing comma added) has an outer select that returns a count for every department no matter what the subselect returns. So I don't know why/how you thought it wouldn't.
If you want to use LEFT (OUTER) JOIN then the DEPARTMENT rows with no employees get extended by NULL. But COUNT of a column only counts non-NULL rows.
SELECT d.deptname, COUNT(e.empno) AS total_emps
FROM DEPARTMENT d
LEFT JOIN EMPLOYEE e
ON d.deptno = e.deptno
GROUP BY d.deptno
ORDER BY total_emps ASC;
(Nb the LEFT JOIN version uses more concepts: LEFT JOIN extending by NULL, GROUP BY, and COUNT's NULL behaviour for non-*.)
First off, it's a left outer join. Now, for your query, you want to join the 2 tables based on deptno, then also group by deptno (or deptname, since that is as likely to be unique) to ensure that any aggregation we do is done for each unique department in the table. Finally, the counting is done with the count function, leading to this query:
select d.deptname, count(e.empno) as total_emps
from department d
left join employee e on d.deptno = e.deptno
group by d.deptname
SQLFiddle
Note that since we want all records from department regardless of whether there are matching records in employee or not, department must appear at the left side of the join. We could have done the same thing using a right outer join by swapping the positions of the 2 tables in the join.