Using in vs exists on mysql - mysql

I know there are very similar questions about this topic. The thing is that every example I've seen points to EXISTS and IN being the same and returning the same results. I have an example of using both but I'm getting different results. Maybe I'm doing it wrong. I'm new to SQl in general.
Here are my code examples using MySQL.
First query using IN
select lastname, firstname
from employees
where officecode in (select officecode from offices where country = 'USA');
The result:
Second query using EXISTS:
select lastname, firstname
from employees
where exists (select officecode from offices where country = 'USA');
Result:
Obviously the queries are not equivalent as I'm getting different results. Am I using IN and EXISTS the wrong way?

Your two queries are very different. The first query is:
select e.lastname, e.firstname
from employees e
where e.officecode in (select o.officecode from offices o where o.country = 'USA');
(Note that I qualified all the column names.)
This gets employees where the corresponding office is in the USA.
This query is quite different:
select e.lastname, e.firstname
from employees e
where exists (select o.officecode from offices o where o.country = 'USA');
It is an all-or-nothing query. It returns all employees if any office is in the USA. It returns nothing otherwise.
To be equivalent to the first query, you need a correlation clause. This connects the inner query to the outer query:
select e.lastname, e.firstname
from employees e
where exists (select 1
from offices o
where o.officecode = e.officecode and o.country = 'USA'
);
With this change, the two queries should produce identical results.

For them to return the same results, you need a where clause for your exist statement to relate it to your outer query... from employees e where exists (select o.officecode from offices o where o.country = 'USA' and o.officecode = e.officecode). The return of each is different as well. EXISTS returns a BOOLEAN.

Exists checks for the condition to be true.
So in the second query it will run the first part of the query whenever exists condition is true in the inner query. Exists will check if the inner query returns one or more than one row. In your case it will be true and hence the outer query will be like select lastname, firstname from employees;
But in the first case it will check the condition where all the officecode are found in the subquery and matches the officecode in the outer query and will return the ones which has country USA

Left join and null check is faster.
Select * from t1
Left join t2 on(t1.=t2)
Where t2.id is null

Related

JOIN subquery and different reault

I was writing an exercise about "Write a query to find the names (first_name, last_name) of the employees who are not supervisors"
I write it on my own and when i check the result or both, mine has less rows than the other.
I was using the JOIN function and the other doesn't.
I want help to know why two results are so different.
Thanks
the one i use join
SELECT
first_name, last_name
FROM
employees AS E
JOIN
departments AS D ON E.department_id = D.department_id
WHERE
NOT EXISTS( SELECT
0
FROM
departments
WHERE
E.manager_id = D.manager_id)
order by last_name;
the one doesn't use join
SELECT
b.first_name, b.last_name
FROM
employees b
WHERE
NOT EXISTS( SELECT
0
FROM
employees a
WHERE
a.manager_id = b.employee_id);
The big problem is that the NOT EXISTS subquery is referencing rows from the joined table. The manager_id column is qualified with D., and that's a reference to the joined table, not the table in the FROM clause of the subquery.
E.manager_id = D.manager_id
^^^^^^^ ^
We also suspect that an employee's supervisor is recorded in the employee row, as a reference to another row in the the employee table. But we don't have the schema definition or any example data, so we're just guessing.
It seems like there would be a supervisor_id in the employee table...
SELECT e.first_name
, e.last_name
, e.department_id
, d.department_id
FROM employees e
WHERE NOT EXISTS
( SELECT 1
FROM employees s
WHERE s.id = e.supervisor_id
)
ORDER
BY e.last_name
, e.first_name
It's also possible that some rows in employee have a value in the department_id column that don't have a matching row in the department table. If there is no matching row in department, the inner join will prevent the row from employee from being returned.
We can use an outer join when we want to return rows even when no matching row is found in the joined table. If we want to involve the departments table, because a "supervisor" is defined to be an employee that is the manager of a department, we can employ an anti-join pattern...
SELECT e.first_name
, e.last_name
FROM employees e
LEFT
JOIN departments d
ON d.manager_id = e.employee_id
WHERE d.manager_id IS NULL
ORDER
BY e.last_name
, e.first_name
Again, without a schema and some sample data, we're just guessing.
This query can be written using only Employees table, but in your query you have joined the employees table with department table. A query should be written with the minimal amount of tables that suffice your expected output, joining unnecessary tables may result in wrong out puts.
In this case you are joining Employees with department here what if there is no Department_ID in employee table for some employees, so those data will be dropped in the join and result won't be the expected.

Can't Figure Out how to join tables

SELECT e.ManagerID, count(*) as NumberOfDepartments
From HumanResources.Employee e, Person.Contact c
where e.ContactID = c.ContactID
group by e.ManagerID;
The Goal is to write a report to display the managerid, firstname and lastname of that manager and the number of unique different departments they supervise. and only show the manager that supervises the most departments.
I have to ensure that all employees are currently employed (ie enddate does not contain a date).
The code above is working in showing number the managerID and number of department he runs but whenever I try to put in the first name and last name I have to put them also in the 'group by' clause and that way it makes the whole report going crazy. Please Help.
Database Here
From your schema, seems that the managerID column in Employee is populated with the ID of the manager for that employee. That would explain why when adding firstName and lastName the report goes crazy, because you'd be grouping by the employee's name, not the manager's.
Without seeing the tables content it's hard to tell, but you may have that managers can be recognised by not having managerID populated.
If this is the case, you can write your query like this
select e.EmployeeID, c.firstName, e.lastName, count(distinct edh.DepartmentID)
from Employee e
join Contact c
on e.ContactID = c.ContactID
join Employee e2
on e1.EmployeeID = e2.ManagerID
join EmployeeDepartmentHistory edh
on e2.EmployeeID = edh.EmployeeID
where e.ManagerID is null and edh.EndDate is null
group by e.EmployeeID, c.firstName, e.lastName
The first instance of Employee table is the managers (because you set where e.ManagerID is null), the join with Contact gets you the managers' names, the second instance of Employee gets you all the people managed by each manager, and the join with EmployeeDepartmentHistory gets you their department (which you count on) and their EndDate, that has to be null to ensure you that they're currenty employed.
Edit
Please note the way I wrote the joins; writing them as comma separated tables names in your from clause with the join condition in your where is a bad habit that should be kicked, because it makes reading, maintaining and changing them to outer joins much harder. That's why join was introduced in SQL language back in 1992.
In MSSQL:
SELECT e.ManagerID, e.FirstName, e.LastName, COUNT(*) AS NumberOfDepartments FROM HumanResources.Employee e
INNER JOIN Person.Contact c ON e.ContactID=c.ContactID
GROUP BY e.ManagerID, e.FirstName, e.LastName
If you need it in MySql, change ON to WHERE pattern and INNER JOIN to JOIN

Two table joins for one query -- confusing result

I'm trying to execute two separate inner table joins in my query to return values from two tables.
SELECT pname, avg(salary)
FROM project p INNER JOIN department d on p.dnum = d.dnumber
INNER JOIN employee e ON e.dno = d.dnumber;
I'm getting one row in the result set... pname = null, avg(salary) = null.
Result set should contain 11 rows because there are 11 projects in the schema.
Can someone point me in the right direction?
Thank you
You are missing the group by:
SELECT pname, avg(salary)
FROM project p INNER JOIN
department d
on p.dnum = d.dnumber INNER JOIN
employee e
ON e.dno = d.dnumber
GROUP BY pname;
In most databases, your version would fail with an obvious syntax error. MySQL only enforces the ANSI standard if you use the ONLY_FULL_GROUP_BY mode (see here).
Use left outer join instead of inner join
Or can you show me your data tables
Do you need the department table in your query?
Does the following query return all the data you need to summarize?
SELECT pname, salary
FROM ( SELECT salary, dno AS dnum FROM employee ) e
NATURAL JOIN project;
If it does, then this might be the summarization you require:
SELECT pname, AVG( salary ) AS average_salary
FROM ( SELECT salary, dno AS dnum FROM employee ) e
NATURAL JOIN project
GROUP
BY pname;

Rewriting a sub-query into a single query so it can be put into a view in SQL

I have a homework problem where I have this schema:
Emp(eid:integer, ename:string, age:integer, salary:real)
Works(eid:integer, did:integer)
Dept(did:integer, managerid:integer)
And with this, I am supposed to construct a single view entitled Manager giving the eid, ename of an employee and his/her manager's eid and ename. Then I am supposed to say if the view is updatable, and why/why not.
So, I successfully created the query to do get this information:
SELECT
Employee.eid,
Employee.ename,
Manager.mid,
Manager.ename
FROM
(
SELECT Dept.managerid AS mid, ename, did
FROM Emp
INNER JOIN Dept
ON Emp.eid = Dept.managerid
) as Manager
INNER JOIN
(
SELECT ename, Emp.eid, did
FROM Emp
INNER JOIN Works
ON Emp.eid = Works.eid
) AS Employee
ON
Employee.did = Manager.did
AND
Employee.eid != Manager.mid;
But when I add CREATE VIEW Manager AS to the top of the statement to create a view out of it, I end up getting this error: View's SELECT contains a sub query in the FROM clause
For reference, I have uploaded my code to sqlfiddle. The only thing that needs to be done is to add CREATE VIEW Manager AS right before the SELECT statement at the bottom.
I was wondering if there is a way I could somehow refactor this statement to not use a nested SELECT so that I can only use a single view. Unfortunately I am limited to using only one view for my assignment.
Subqueries are an unnecessarily complicated way of going about this. Easier just to join emp to works, work to dept, and then dept back to emp.
select e.*, e2.*
from emp e
inner join works w
on w.eid = e.eid
inner join dept d
on w.did = d.did
inner join emp e2
on d.managerid = e2.eid
updated your fiddle
To create a view from this, you would need to alias the resulting columns, since there are duplicates. eg:
select e.eid as `employee_id`, e.ename as `employee_name`, e2.eid as `manager_id`, e2.ename as `manager_name`
instead of select e.*, e2.*

Nested SQL Query, what is actually occurring at each nesting?

I was wondering how this query works:
SELECT empname FROM Employee WHERE not exists (
SELECT projid FROM Project WHERE not exists (
SELECT empid, projid FROM Assigned WHERE empid = Employee.empid and projid = Project.projid
)
)
It is supposed to return names of all employees who are assigned to every project and it does work however I am getting confused as to how/why it works correctly.
Schema is:
Employee(empID INT,empName VARCHAR(100),job VARCHAR(100),deptID INT,salary INT);
Assigned(empID INT,projID INT,role VARCHAR(100));
Project(projID INT,title VARCHAR(100),budget INT,funds INT);
I am new to SQL so a detailed/simple explanation would be appreciated.
When I need to try to understand what's going on, I look for the inner-most query and work my way outwards. In your case, let's start with:
SELECT empid, projid
FROM Assigned
WHERE empid = Employee.empid and projid = Project.projid
This is matching all records in the Assigned table where the empid and projid are in the previous tables (hence the Employee.empid and Project.projid).
Assume there are 5 projects in the Projects table and Employee1 is assigned to each. That would return 5 records. Also assume Employee2 is assigned to 1 of those projects, thus returning 1 record.
Next look at:
SELECT projid FROM Project WHERE not exists (
...
)
Now this says for those found records in the previous query (Employee1 with 5 projects and Employee2 with 1 project), select any projid from the Project table where there aren't any matches (not exists) from the previous query. In other words, Employee1 would return no projects from this query but Employee2 would return 4 projects.
Finally, look at
SELECT empname FROM Employee WHERE not exists (
...
)
Just as with the 2nd query, for any records found in the previous query (no records to match those employees with all projects such as Employee1 and some records if the employee isn't assigned to every project such as Employee2), select any employee from the Employee table where there aren't any matches (again, not exists). In other words, Employee1 would return since no projects were returned from the previous query, and Employee2 would not return, since 1 or more projects were returned from the previous query.
Hope this helps. Here's some additional information about EXISTS:
http://dev.mysql.com/doc/refman/5.0/en/exists-and-not-exists-subqueries.html
And from that article:
What kind of store is present in all cities?
SELECT DISTINCT store_type FROM stores s1 WHERE NOT EXISTS (
SELECT * FROM cities WHERE NOT EXISTS (
SELECT * FROM cities_stores
WHERE cities_stores.city = cities.city AND cities_stores.store_type = stores.store_type));
The last example is a double-nested NOT EXISTS query. That is, it has
a NOT EXISTS clause within a NOT EXISTS clause. Formally, it answers
the question “does a city exist with a store that is not in Stores”?
But it is easier to say that a nested NOT EXISTS answers the question
“is x TRUE for all y?”
Good luck.
A NOT EXISTS (subquery) predicate will return TRUE when the resultset from the subquery has no rows. It will return FALSE when a matching row is found.
Essentially, the query is asking
for each row in Employee... check each row from the Project table, to see if there is a row in the Assigned table for a row that has an empid that matches the empid on the Employee row and a projid that matches a row in the Project table.
The row from Employee will be returned only if no matching row is found.
Note that the expressions in the SELECT list of the subquery are not important; all that is being checked is whether that subquery returns one (or more) rows or not. Normally, we use a literal 1 in the SELECT list; that remind us that what we are checking is whether a row is found or not.)
I would typically write that query in a style that looks like this:
SELECT e.empname
FROM Employee e
WHERE NOT EXISTS
( SELECT 1
FROM Project p
WHERE NOT EXISTS
( SELECT 1
FROM Assigned a
WHERE a.empid = e.empid
AND a.projid = p.projid
)
)
And I read the "SELECT 1" as "select one row")
The resultset from that query is essentially equivalent to the resultset from this (usually much less efficient) query:
SELECT e.empname
FROM Employee e
WHERE e.empid NOT IN
( SELECT a.empid
FROM Assigned a
JOIN Project p
ON a.projid = p.projid
WHERE a.empid IS NOT NULL
GROUP
BY a.empid
)
The NOT IN query can be a little easier to understand, because you can run that subquery and see that it returns something. (What can be kind of confusing about the NOT EXISTS subquery is that it doesn't matter what expressions are returned in the SELECT list; what matters is whether a row is returned or not.) There are some "gotchas" with the NOT IN subquery besides really bad performance; you need to be careful to ensure that the subquery does not return a NULL value, because then the NOT IN (NULL,...) will never return true.
An equivalent resultset can be returned using an anti-join pattern as well:
SELECT e.empname
FROM Employee e
LEFT
JOIN ( SELECT a.empid
FROM Assigned a
JOIN Project p
ON a.projid = p.projid
WHERE a.empid IS NOT NULL
GROUP
BY a.empid
) o
ON o.empid = e.empid
WHERE o.empid IS NULL
In that query, we are looking for "matches" on empid. The LEFT keyword tells MySQL to also return any rows from Employee (the table one the left side of the JOIN) which do not have a match. For those rows, a NULL value is returned in place of the values of the columns that would have been returned if there had been a matching row. The "trick" is then to throw out all the rows that matched. We do that by checking for a NULL in a column that would not be NULL if there had been a match.
If I were going to write this query using a NOT EXISTS predicate, I would probably actually favor writing it like this:
SELECT e.empname
FROM Employee e
WHERE NOT EXISTS
( SELECT 1
FROM Assigned a
JOIN Project p
ON a.projid = p.projid
WHERE a.empid = e.empid
)