I am trying to understand the SQL self-join - especially how the order of the ON clause matters in the query. This is probably a basic question but please bear with me as I'm a beginner in query language.
This is actually a LeetCode Question - #181 where I'm trying to get the employee whose salary is higher than their manager. You can check out the schema through the LeetCode link or the SQL Fiddle example I've provided below.
Question:
Basically I'm trying to understand the difference in output when I run the below two queries:
I changed the order of the ON clause From (ON e.ManagerId = m.Id) to (ON m.ManagerId = e.Id) and I'm getting the inverse answer from the desired output. I thought because it's a self-join, the order wouldn't matter since I'm extracting information from the identical table.
Please let me know what I'm missing and also point to any directions if possible!
Thanks in advance!
1) Correct Query to get Desired Output
Select *
FROM Employee e
INNER JOIN Employee m
ON e.ManagerId = m.Id
WHERE e.Salary > m.Salary
SQL Fiddle Example
2) Incorrect Query
Select *
FROM Employee e
INNER JOIN Employee m
ON m.ManagerId = e.Id
WHERE e.Salary > m.Salary
SQL Fiddle Example
The functionally the order doesn't matter (so, 'ON e.ManagerId = m.Id' is the same as 'ON m.Id = e.ManagerId').
What you are doing here is joining on different columns, which represent different things.
In the incorrect query, you are saying "the managers managerID is the same as the employees ID", which isn't true. Managers (as you've got it in your table) don't have managers themselves.
What you've essentially done is inverse the join. If you were to swap your sign around in you where statement, so WHERE e.Salary > m.Salary to WHERE e.Salary < m.Salary you'd get the same answer as your correct query
In both queries you are joining one employee with another. In the first query, however, you call the subordinate e and the manager m, while in the second you call the manager e and the subordinate m. Let's look at this more closely:
Select *
FROM Employee e
INNER JOIN Employee m
ON e.ManagerId = m.Id
WHERE e.Salary > m.Salary
You are combining an employee (that you call e for short) with their manager (an employee called m here, the ID of which is referenced as the manager ID in the employee record). Then you only keep employee / manager pairs where the employee's salary is greater than the manager's.
Select *
FROM Employee e
INNER JOIN Employee m
ON m.ManagerId = e.Id
WHERE e.Salary > m.Salary
You are combining an employee (that you call e for short) with their subordinate (an employee called m here, the manager ID of which is referencing the employee record). So, the employee that you call e is the other employee's manager. Then you only keep employee (manager) / subordinate pairs where the manager's salary is greater than the subordinate's.
I think you are not realizing that the table alaises refer to the people.
The m copy of the table is the manager, so the variable m.managerId would refer to the manager of the manager. Which is not what you want. So the correct link:
e.ManagerId = m.Id
is linking an employee row's manager to the manager row's ID.
You might want to think of it as only the ids in the Manager_id columns are Managers.
So to get their names you could do:
select name from Employee where id in (select distinct ManagerId from Employee)
distinct is optional. I would tend to do it, if I was debegging the nested select, as it would make sense to only see same ManagerId in there once. Two+ Employees can potentially have the same Manager.
Try running the queries without the where clause, you will see the same results but column order has switched. This is because of the ON clause:
ON e.ManagerId = m.Id
(Employee e to Manager m)
Or longhand join Employee ManagerId, to Manager Id
Joe as the Employee, with Sam as the Manager (ascending hierarchy as you read across columns)
ON m.ManagerId = e.Id
(Manager m to Employee e)
Or longhand join Manager ManagerId, to Employee Id
Sam as the Manager, with Joe as the Employee (descending hierarchy as you read across columns)
Column order not withstanding, if you were to flip the WHERE clause from > to <= when you flipped the ON prefix order you would yield the same results.
Related
I'm trying to formulate an SQL query that can be used to retrieve the employees that manage specific clients. I have a three-database structure, one table for employees and one for clients, with a third table connecting the ID's. For example, when I want to search for the employees that manage clients Jeff and Bill, I would first try the following:
SELECT e.name FROM employees e JOIN manages m ON e.emp_id = m.emp_id JOIN clients c ON m.cli_id = c.cli_id
WHERE c.cli_name = 'Jeff' AND c.cli_name = 'Bill';
However, there's an obvious logical flaw in this query, as it tries to find a row where two mutually exclusive facts are true. I've tried to solve this by grouping by employee name and checking which groups have these properties, but it doesn't seem to work. There must be some way to make this work, but I've not managed to find it online. Ideas?
Use aggregation. Assuming no duplicates:
SELECT e.name
FROM employees e JOIN
manages m
ON e.emp_id = m.emp_id JOIN
clients c
ON m.cli_id = c.cli_id
WHERE c.cli_name IN ('Jeff', 'Bill') -- match names with both clients
GROUP BY e.name
HAVING COUNT(*) = 2; -- keep only names that have two matches
I have two tables, one is departments and the other is employees. The department id is a foreign key in the employees table. The employee table has a name and a flag saying if the person is part-time. I can have zero or more employees in a department. I'm trying to figure out out to get a list of all departments where a department has at least one employee and if it does have at least one employee, that all the employees are part time. I think this has to be some kind of subquery to get this. Here's what I have so far:
SELECT dept.name
,dept.id
,employee.deptid
,count(employee.is_parttime)
FROM employee
,dept
WHERE dept.id = employee.deptid
AND employee.is_parttime = 1
GROUP BY employee.is_parttime
I would really appreciate any help at this point.
You must join (properly) the tables and group by department with a condition in the HAVING clause:
select d.name, d.id, count(e.id) total
from dept d inner join employee e
on d.id = e.deptid
group by d.name, d.id
having total = sum(e.is_parttime)
The inner join returns only departments with at least 1 employee.
The column is_parttime (I guess) is a flag with values 0 or 1 so by summing it the result is the number of employees that are part time in the department and this number is compared to the total number of employees of the department.
As a preliminary aside, I recommend expressing joins with the JOIN keyword, and segregating join conditions from filter conditions. Doing so would make the original query look like so:
select dept.name, dept.id, employee.deptid, count(employee.is_parttime)
from employee
join dept on dept.id = employee.deptid
where employee.is_parttime = 1
group by employee.is_parttime
It doesn't make much practical difference for inner joins, but it does make the structure of the data and the logic of the query a bit clearer. On the other hand, it does make a difference for outer joins, and there is value in consistency.
As for the actual question, yes, one can rewrite the original query using a subquery or an inline view to produce the requested result. (An "inline view" is technically what one should call an embedded query used as a table in the FROM clause, but some people lump these in with subqueries.)
Example using a subquery
select dept.name, dept.id
from dept
where dept.id in (
select deptid
from employee
group by deptid
having count(*) == sum(is_parttime)
)
Example using an inline view
select dept.name, dept.id
from dept
join (
select deptid
from employee
group by deptid
having count(*) == sum(is_parttime)
) pt_dept
on dept.id = pt_dept.deptid
In each case, the subquery / inline view does most of the work. It aggregates employees by department, then filters the groups (HAVING clause) to select only those in which the part-time employee count is the same as the total count. Naturally, departments without any employees will not be represented. If a list of department IDs would suffice for a list of departments, then that's actually all you need. To get the department names too, however, you need to combine that with data from the dept table, as demonstrated in the two example queries.
SELECT e.ManagerID, count(*) as NumberOfDepartments
From HumanResources.Employee e, Person.Contact c
where e.ContactID = c.ContactID
group by e.ManagerID;
The Goal is to write a report to display the managerid, firstname and lastname of that manager and the number of unique different departments they supervise. and only show the manager that supervises the most departments.
I have to ensure that all employees are currently employed (ie enddate does not contain a date).
The code above is working in showing number the managerID and number of department he runs but whenever I try to put in the first name and last name I have to put them also in the 'group by' clause and that way it makes the whole report going crazy. Please Help.
Database Here
From your schema, seems that the managerID column in Employee is populated with the ID of the manager for that employee. That would explain why when adding firstName and lastName the report goes crazy, because you'd be grouping by the employee's name, not the manager's.
Without seeing the tables content it's hard to tell, but you may have that managers can be recognised by not having managerID populated.
If this is the case, you can write your query like this
select e.EmployeeID, c.firstName, e.lastName, count(distinct edh.DepartmentID)
from Employee e
join Contact c
on e.ContactID = c.ContactID
join Employee e2
on e1.EmployeeID = e2.ManagerID
join EmployeeDepartmentHistory edh
on e2.EmployeeID = edh.EmployeeID
where e.ManagerID is null and edh.EndDate is null
group by e.EmployeeID, c.firstName, e.lastName
The first instance of Employee table is the managers (because you set where e.ManagerID is null), the join with Contact gets you the managers' names, the second instance of Employee gets you all the people managed by each manager, and the join with EmployeeDepartmentHistory gets you their department (which you count on) and their EndDate, that has to be null to ensure you that they're currenty employed.
Edit
Please note the way I wrote the joins; writing them as comma separated tables names in your from clause with the join condition in your where is a bad habit that should be kicked, because it makes reading, maintaining and changing them to outer joins much harder. That's why join was introduced in SQL language back in 1992.
In MSSQL:
SELECT e.ManagerID, e.FirstName, e.LastName, COUNT(*) AS NumberOfDepartments FROM HumanResources.Employee e
INNER JOIN Person.Contact c ON e.ContactID=c.ContactID
GROUP BY e.ManagerID, e.FirstName, e.LastName
If you need it in MySql, change ON to WHERE pattern and INNER JOIN to JOIN
I have 2 tables, 1 called Employee and 1 called Salary. Employee table consists of Emp_Name, Emp_Address, Emp_ID & Salary table consists of Salary_Details and Emp_ID. > Can you write down a query for retrieving the Salary_Details of 1 of the employee based on last name using Inner Join?
I am not sure what you are looking for, but this might help you:
SELECT * FROM Employee e
INNER JOIN Salary s ON e.Emp_ID = s.Emp_ID
WHERE e.Emp_Name = 'EMPLOYEENAME'
That will give you back all fields from Employee and Salary for an Employee with the name = 'EMPLOYEENAME' (which you can exchange then).
You can adjust the columns returned as needed depending on your app...
SELECT e.Emp_Name, e.Emp_ID, s.Salary_Details
FROM Employee e
INNER JOIN Salary s USING (Emp_ID)
WHERE e.Emp_Name = 'Smith';
The USING keyword is kind of obscure and works only if the join column is named identically in both tables. The previous answer with ON instead of USING will work in all cases. I like USING as a personal preference.
I need help writing some very basic SQL code. I do not need an eleborate code, the very basics of SQL. I will write my database first, then have the question under.
Okay so I have the four following tables:
Division(did, dname, managerID)
Employee(empid, name, salary, did)
Project(pid, pname, budget, did)
Workon(pid, empid, hours)
The text is bold is the primary key and the text in italic is the foreign key. The Workon table connects both the employee and project tables.
Here are my questions:
List the name of divisions that have/sponsor project(s) employee 'chen' works on.
This is what I have...
select pname, d.dname
from project p, division d
where pid in
(select pid
from employee e, workon w
where e.empid=w.empid and lower (name) 'Chen')
Now I know this is not completely right because I think I have to take it from the Workon table, but I am unsure how to.
List the name of the employee that has the lowest salary in his division and list the total number of projects this employee is work on (use correlated subquery).
This is what I have
select name, did, min(salary) as "lowest salary"
from employee
group by name, did
order by did
and I also have this code..
select did, min(salary)
from employee
group by did
order by did
I am confused because the first code is not only giving me the lowest salary of the division. (If you look at the second code I wrote it shows the average.)
List the name of employee who do more projects than his/her divisional colleagues (correlated subquery).
I have...
select pname
from project p, workon w
where p.pid= w.pid
group by pname
having count (empid) >2
I have a feeling this shouldn't be more than 2. It should be more than the colleague, but I can not figure out how to write that.
List the name of project that some employee(s) who is/are working on it make less than divisional average salary.
select dname
from division d, employee e
where d.did= e.did
group by dname
having avg(salary)>(select avg(salary) from employee)
I do not think this is right either, but I know I am almost there
Search for TDQD on SO to see how I go about Test-Driven Query Design. You can test each step of the query as it is built up, and if something goes wrong with a step, take appropriate corrective action to make the query correct.
I'll tackle the first of your queries:
List the names of divisions that sponsor projects that employee 'chen' works on.
Given your attempted query, I am going to assume that your DBMS supports a function LOWER(x) that case-converts all the letters in its argument to lower-case, leaving any non-letters alone.
Taking this one step at a time, we might come up with:
1. Employee ID for employee 'chen'
SELECT EmpID
FROM Employee
WHERE LOWER(Name) = 'chen'
2. Project IDs of projects that employee 'chen' works on
SELECT W.PID
FROM WorkOn AS W
JOIN Employee AS E
ON E.EmpID = W.EmpID
WHERE LOWER(E.Name) = 'chen'
3. Division IDs of divisions sponsoring projects that employee 'chen' works on
SELECT P.DID
FROM Project AS P
JOIN WorkOn AS W ON W.PID = P.PID
JOIN Employee AS E ON E.EmpID = W.EmpID
WHERE LOWER(E.Name) = 'chen'
4. Names of divisions sponsoring projects that employee 'chen' works on
SELECT DISTINCT D.Dname
FROM Division AS D
JOIN Project AS P ON P.DID = D.DID
JOIN WorkOn AS W ON W.PID = P.PID
JOIN Employee AS E ON E.EmpID = W.EmpID
WHERE LOWER(E.Name) = 'chen'
The DISTINCT removes any repeats if 'chen' works on several projects sponsored by the same division.
You need to apply a similar technique to the other queries in your question. I'm not sure that you need any correlated subqueries. You will probably need 'subqueries in the FROM clause', but you shouldn't need correlated subqueries — IMO, of course. If I spent long enough on it, I could probably come up with a correlated subquery, but such queries are unlikely to perform as well as straight-forward queries and will probably be a lot harder to read (and write). You'll need to assess whether your tutor will accept correct answers that do not use correlated subqueries (but that do produce the correct answer).