Nested SQL Query, what is actually occurring at each nesting?

Nested SQL Query, what is actually occurring at each nesting? - mysql

I was wondering how this query works:
SELECT empname FROM Employee WHERE not exists (
SELECT projid FROM Project WHERE not exists (
SELECT empid, projid FROM Assigned WHERE empid = Employee.empid and projid = Project.projid
)
)
It is supposed to return names of all employees who are assigned to every project and it does work however I am getting confused as to how/why it works correctly.
Schema is:
Employee(empID INT,empName VARCHAR(100),job VARCHAR(100),deptID INT,salary INT);
Assigned(empID INT,projID INT,role VARCHAR(100));
Project(projID INT,title VARCHAR(100),budget INT,funds INT);
I am new to SQL so a detailed/simple explanation would be appreciated.

When I need to try to understand what's going on, I look for the inner-most query and work my way outwards. In your case, let's start with:
SELECT empid, projid
FROM Assigned
WHERE empid = Employee.empid and projid = Project.projid
This is matching all records in the Assigned table where the empid and projid are in the previous tables (hence the Employee.empid and Project.projid).
Assume there are 5 projects in the Projects table and Employee1 is assigned to each. That would return 5 records. Also assume Employee2 is assigned to 1 of those projects, thus returning 1 record.
Next look at:
SELECT projid FROM Project WHERE not exists (
...
)
Now this says for those found records in the previous query (Employee1 with 5 projects and Employee2 with 1 project), select any projid from the Project table where there aren't any matches (not exists) from the previous query. In other words, Employee1 would return no projects from this query but Employee2 would return 4 projects.
Finally, look at
SELECT empname FROM Employee WHERE not exists (
...
)
Just as with the 2nd query, for any records found in the previous query (no records to match those employees with all projects such as Employee1 and some records if the employee isn't assigned to every project such as Employee2), select any employee from the Employee table where there aren't any matches (again, not exists). In other words, Employee1 would return since no projects were returned from the previous query, and Employee2 would not return, since 1 or more projects were returned from the previous query.
Hope this helps. Here's some additional information about EXISTS:
http://dev.mysql.com/doc/refman/5.0/en/exists-and-not-exists-subqueries.html
And from that article:
What kind of store is present in all cities?
SELECT DISTINCT store_type FROM stores s1 WHERE NOT EXISTS (
SELECT * FROM cities WHERE NOT EXISTS (
SELECT * FROM cities_stores
WHERE cities_stores.city = cities.city AND cities_stores.store_type = stores.store_type));
The last example is a double-nested NOT EXISTS query. That is, it has
a NOT EXISTS clause within a NOT EXISTS clause. Formally, it answers
the question “does a city exist with a store that is not in Stores”?
But it is easier to say that a nested NOT EXISTS answers the question
“is x TRUE for all y?”
Good luck.

A NOT EXISTS (subquery) predicate will return TRUE when the resultset from the subquery has no rows. It will return FALSE when a matching row is found.
Essentially, the query is asking
for each row in Employee... check each row from the Project table, to see if there is a row in the Assigned table for a row that has an empid that matches the empid on the Employee row and a projid that matches a row in the Project table.
The row from Employee will be returned only if no matching row is found.
Note that the expressions in the SELECT list of the subquery are not important; all that is being checked is whether that subquery returns one (or more) rows or not. Normally, we use a literal 1 in the SELECT list; that remind us that what we are checking is whether a row is found or not.)
I would typically write that query in a style that looks like this:
SELECT e.empname
FROM Employee e
WHERE NOT EXISTS
( SELECT 1
FROM Project p
WHERE NOT EXISTS
( SELECT 1
FROM Assigned a
WHERE a.empid = e.empid
AND a.projid = p.projid
)
)
And I read the "SELECT 1" as "select one row")
The resultset from that query is essentially equivalent to the resultset from this (usually much less efficient) query:
SELECT e.empname
FROM Employee e
WHERE e.empid NOT IN
( SELECT a.empid
FROM Assigned a
JOIN Project p
ON a.projid = p.projid
WHERE a.empid IS NOT NULL
GROUP
BY a.empid
)
The NOT IN query can be a little easier to understand, because you can run that subquery and see that it returns something. (What can be kind of confusing about the NOT EXISTS subquery is that it doesn't matter what expressions are returned in the SELECT list; what matters is whether a row is returned or not.) There are some "gotchas" with the NOT IN subquery besides really bad performance; you need to be careful to ensure that the subquery does not return a NULL value, because then the NOT IN (NULL,...) will never return true.
An equivalent resultset can be returned using an anti-join pattern as well:
SELECT e.empname
FROM Employee e
LEFT
JOIN ( SELECT a.empid
FROM Assigned a
JOIN Project p
ON a.projid = p.projid
WHERE a.empid IS NOT NULL
GROUP
BY a.empid
) o
ON o.empid = e.empid
WHERE o.empid IS NULL
In that query, we are looking for "matches" on empid. The LEFT keyword tells MySQL to also return any rows from Employee (the table one the left side of the JOIN) which do not have a match. For those rows, a NULL value is returned in place of the values of the columns that would have been returned if there had been a matching row. The "trick" is then to throw out all the rows that matched. We do that by checking for a NULL in a column that would not be NULL if there had been a match.
If I were going to write this query using a NOT EXISTS predicate, I would probably actually favor writing it like this:
SELECT e.empname
FROM Employee e
WHERE NOT EXISTS
( SELECT 1
FROM Assigned a
JOIN Project p
ON a.projid = p.projid
WHERE a.empid = e.empid
)

Related

SQL SELECT looking for a name like a substring in a join

How should I solve this question?
My SQL statement is:
select ename
from employee e
inner join certified c on e.eid = c.eid
inner join aircraft a on c.aid = a.aid
where cruisingrange> 1000 && a.aname not like'%b'
The answer I get with this query is Jacob and Emily which is wrong. Jacob should not be retrieved.
How should I modify or add to the SQL statement?
Script:
CREATE TABLE Employee (
Eid int,
Ename nvarchar(100),
Salary int
)
INSERT INTO Employee VALUES
(1,'Jacob',85000),(2,'Michael',55000),(3,'Emily',80000),
(4,'Ashley',110000),(5,'Daniel',80000),(6,'Olivia',70000)
CREATE TABLE Aircraft (
Aid int,
Aname nvarchar(100),
Cruisingrange int
)
INSERT INTO Aircraft VALUES
(1,'a1',800),(2,'a2b',700),(3,'a3',1000),
(4,'a4b',1100),(5,'a5',1200)
CREATE TABLE Flight (
Flno int,
Fly_from nvarchar(100),
Fly_to nvarchar(100),
Distance int,
Price int
)
INSERT INTO Flight VALUES
(1,'LA','SF',600,65000),(2,'LA','SF',700,70000),(3,'LA','SF',800,90000),
(4,'LA','NY',1000,85000),(5,'NY','LA',1100,95000)
CREATE TABLE Certified (
Eid int,
Aid int,
CertDate date
)
INSERT INTO Certified VALUES
(1, 1, '2005-01-01'),(1, 2, '2001-01-01'),(1, 3, '2000-01-01'),
(1, 5, '2000-01-01'),(2, 3, '2002-01-01'),(2, 2, '2003-01-01'),
(3, 3, '2003-01-01'),(3, 5, '2004-01-01')

I would read the "not certified on any" to be a check for the existence of a row.
If a matching row exists, then don't return the employee. Only return the emplouee if a matching row doesn't exist.
How would you find a matching row, to find out if an employee is "certified on any"?
There are several approaches. The two best approaches to use 1) anti-join and 2) NOT EXISTS (correlated subquery).
example NOT EXISTS (correlated subquery)
Of the two approaches this one is easier to see how it works.
FROM e
WHERE NOT EXISTS ( SELECT 1
FROM certified c
JOIN aircraft a
ON a.id = c.a_id
WHERE a.aname LIKE '%b%'
AND c.e_id = e.id
)
Note the reference to the outer table (e.id) in the predicate of the subquery. The subquery is "correlated" with the outer query.
Think of if this way: for every row returned by the outer query, the subquery is executed, passing in the value of e.id. (The optimizer doesn't have to perform the operation this way; that's just an easy way of thinking about what we're asking for.)
If the subquery returns 1 or more rows, the condition EXISTS is satisfied, and returns TRUE. If the subquery returns zero rows, EXISTS evaluates to FALSE.
example of anti-join pattern
This approach can take a bit to get your brain wrapped around. Once you do "get it", it's an invaluable tool to keep handy in the SQL toolbelt.
If we use an OUTER JOIN, and pull back all rows from e along with any matching rows, then we can "exclude" the rows that found a match.
FROM e
LEFT
JOIN ( SELECT c.e_id
FROM certified c
JOIN aircraft a
ON a.id = c.a_id
WHERE a.aname LIKE '%b%'
GROUP BY c.e_id
) b
ON b.e_id = e.id
WHERE b.e_id IS NULL
The inline view query is materialized into a derived table named b. That query is intended to return the id of every employee that is certified to fly any aircraft meeting the specified criteria. Then the rows in the derived table are outer joined to e.
The "trick" is the outer join (to include both rows with matches, and rows without matches, and the condition in the WHERE clause that excludes rows that had matches.
I expect someone else will provide an example of how to use a NOT IN (subquery). With that approach, beware of what happens if the subquery returns any NULL values. (HINT: you will want to ensure that the subquery will never ever return a NULL.)
This demonstrates only two of several possible approaches to satisfying the "is not certified on any" criteria.
Obviously, additional joins/subqueries will need to be added to evaluate the other criteria in the query.

You can use this query in SQL Server (maybe it will work on MySQL too):
SELECT e.Ename
FROM Employee e
INNER JOIN Certified c
ON e.Eid = c.Eid
LEFT JOIN aircraft a
ON c.Aid = a.Aid and a.Cruisingrange> 1000
LEFT JOIN aircraft a1
ON c.Aid = a1.Aid and a1.aname like'%b%'
GROUP BY e.Ename
HAVING MAX(a.Aid) IS NOT NULL AND MAX(a1.Aid) IS NULL
We join all tables we need with conditions we need. Then we group by Ename and use MAX(a.Aid) IS NOT NULL that means that pilot can operate an air-vehicle with Cruisingrange > 1000 and MAX(a1.Aid) IS NULL means he is not certified on any aircraft with b in name.

Converting a Join Query into a Sub Query

I'm attempting to write a sub query that wold accomplish the same results as the join query shown below.
SELECT Department_to_major.DNAME
FROM Department_to_major
INNER JOIN Course
ON Department_to_major.Dcode = Course.OFFERING_DEPT
WHERE Course.COURSE_NAME LIKE '%INTRO%'
GROUP BY Department_to_major.DNAME
However each attempt has produced errors.
Is there a way to write this as a sub query?

Hi, You can use below query,
SELECT DNAME FROM Department_to_major WHERE
Dcode IN (SELECT OFFERING_DEPT FROM Course
WHERE COURSE_NAME LIKE '%INTRO%')
You have used GROUP BY clause, but there is no any aggregate function in the query. Is your query works fine?

Here is a way to use a subquery:
SELECT DISTINCT dm.DNAME
FROM Department_to_major dm
WHERE EXISTS (SELECT 1
FROM Course c
WHERE dm.Dcode = c.OFFERING_DEPT AND
c.COURSE_NAME LIKE '%INTRO%'
);
I assume the GROUP BY is to prevent duplicates in the output; SELECT DISTINCT does the same thing.
That said, storing the department code and name in Department_to_major is not a good data structure, because the department name is (presumably) repeated multiple times. I would expect you to have just a Departments table, with one row per department.
Then the query would look like:
SELECT d.DNAME
FROM Departments d
WHERE EXISTS (SELECT 1
FROM Course c
WHERE d.Dcode = c.OFFERING_DEPT AND
c.COURSE_NAME LIKE '%INTRO%'
);
And the SELECT DISTINCT/GROUP BY is unnecessary.

Try the below query. I am assuming that you have used "GROUP BY" clause to make DNAME field unique.
SELECT DISTINCT(DNAME)
FROM Department_to_major
WHERE Dcode IN (SELECT OFFERING_DEPT
FROM Course
WHERE COURSE_NAME LIKE '%INTRO%');

MYSQL using count(1) in the where clause?

I'm doing facial recognition. I have a database of people from group A and people from group B. I want to check every person in A with every person in B. I have a number of different algorithms I'm running to verify the faces. To do this I set up the following tables
comparison (
id int,
personA_id int,
personB_id int,
)
facerecScore (
id int,
score int,
comparison_id int,
algo_id int,
)
So lets say I had an eigenfaces program running as my first algorithm I'm testing. Eigenfaces would have an algo_id of 1.
What I want to do is make a query that selects personA and personB from comparison where there exist no existing records in the facerecScore table where algo_id is 1 and the comparison is that comparison.
In other words, if I have already run eigenfaces on these two people, I don't want to run it again. Thus I don't want to select a comparison that already has a record in the facerecscore table with an algo_id of 1

You could try something like the following which will find all rows in comparison which do not have a record in facerecScore for a given algo_id given by the parameter :current_algo
SELECT *
FROM comparison
WHERE id not in (
SELECT comparison_id
FROM facerecScore
WHERE algo_id = :current_algo
);
In the scenario that you want to find all comparison rows for all algo_ids that do not have a corresponding record in facerecScore then you could use something like the following.
SELECT *
FROM comparison, (SELECT algo_id FROM facerecScore GROUP BY algo_id) algo
WHERE id not in (
SELECT comparison_id
FROM facerecScore
WHERE algo_id = algo.algo_id
);
Simply this query first finds all combinations of comparison rows and algo_id then removes any which have a record in facerecScore from the result set.

For anyone who hates correlated subqueries (e.g. for performance reasons, if the original query wasn't optimised), it's possible with a left join and excluding any rows that were actually joined:
Update: Inspired by #penfold's "find all" answer, this is a join+union alternative if the list of algo_ids is known (and short):
select '1' algo_id, c.*
from comparison c
left join facerecScore f
on c.id = f.comparison_id
and f.algo_id = 1
where f.id is null
union all
select '2' algo_id, c.*
from comparison c
left join facerecScore f
on c.id = f.comparison_id
and f.algo_id = 2
where f.id is null
...
Or a more general one (not sure which one will perform better):
select a.algo_id, c.id
from comparison c
cross join (select algo_id from facerecScore group by algo_id) a
left join facerecScore f
on c.id = f.comparison_id
and f.algo_id = a.algo_id
where f.id is null

You can use this, it will return first combination that hasn't been touched. Remove the last part Limit 1,1 and you will get all the combinations that haven't been touched.
SELECT *
FROM comparison
WHERE id
not in (
select comparison_id
from facerecScore
where algo_id = 1)
Limit 1,1

SELECT personA_id, personB_id FROM comparison WHERE id NOT IN (SELECT comparison_id FROM facerecScore WHERE algo_id = 1);
This is probably pretty bad on efficiency with the subquery, but it should give you the right results. Possibly someone else can find a more efficient solution.

Find all rows with no matching result SQL

I have one table that contains first name and last name of employees in my company, and a field that determines whether they are still working for the company.
I have another table with contains list of tasks for the employees - It also contains two field with first and last name of the employee (- and yes, I know that's not a good structure).
I want to be able to find all employees that are still working for the company but have no tasks using MySQL query.
Any ideas?

SELECT *
FROM employees
WHERE still_working_for_company
AND NOT EXISTS (
SELECT TRUE
FROM tasks
WHERE tasks.firstname = employees.firstname
AND tasks.lastname = employees.lastname
)

You can try this--
select * from FirstTable where firstTable.employee='yes' and
firstTable.empid IN (select secondTbl.empId where firstTable.empid = secondTbl.empId)
Query is not tested and assume that your second table (task table) contain employee data only when task is assign.

Try this:
SELECT e.*
FROM emp e
LEFT JOIN emptask et ON e.firstname = et.firstname AND e.lastname = et.lastname
WHERE e.stillworks = 'y' AND et.taskid IS NULL
GROUP BY e.firstname, e.lastname

SELECT * WHERE NOT EXISTS

I think I'm going down the right path with this one... Please bear with me as my SQL isn't the greatest
I'm trying to query a database to select everything from one table where certain cells don't exist in another. That much doesn't make a lot of sense but I'm hoping this piece of code will
SELECT * from employees WHERE NOT EXISTS (SELECT name FROM eotm_dyn)
So basically I have one table with a list of employees and their details. Then another table with some other details, including their name. Where there name is not in the eotm_dyn table, meaning there is no entry for them, I would like to see exactly who they are, or in other words, see what exactly is missing.
The above query returns nothing, but I know there are 20ish names missing so I've obviously not gotten it right.
Can anyone help?

You didn't join the table in your query.
Your original query will always return nothing unless there are no records at all in eotm_dyn, in which case it will return everything.
Assuming these tables should be joined on employeeID, use the following:
SELECT *
FROM employees e
WHERE NOT EXISTS
(
SELECT null
FROM eotm_dyn d
WHERE d.employeeID = e.id
)
You can join these tables with a LEFT JOIN keyword and filter out the NULL's, but this will likely be less efficient than using NOT EXISTS.

SELECT * FROM employees WHERE name NOT IN (SELECT name FROM eotm_dyn)
OR
SELECT * FROM employees WHERE NOT EXISTS (SELECT * FROM eotm_dyn WHERE eotm_dyn.name = employees.name)
OR
SELECT * FROM employees LEFT OUTER JOIN eotm_dyn ON eotm_dyn.name = employees.name WHERE eotm_dyn IS NULL

You can do a LEFT JOIN and assert the joined column is NULL.
Example:
SELECT * FROM employees a LEFT JOIN eotm_dyn b on (a.joinfield=b.joinfield) WHERE b.name IS NULL

SELECT * from employees
WHERE NOT EXISTS (SELECT name FROM eotm_dyn)
Never returns any records unless eotm_dyn is empty. You need to some kind of criteria on SELECT name FROM eotm_dyn like
SELECT * from employees
WHERE NOT EXISTS (
SELECT name FROM eotm_dyn WHERE eotm_dyn.employeeid = employees.employeeid
)
assuming that the two tables are linked by a foreign key relationship. At this point you could use a variety of other options including a LEFT JOIN. The optimizer will typically handle them the same in most cases, however.

You can also have a look at this related question. That user reported that using a join provided better performance than using a sub query.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Nested SQL Query, what is actually occurring at each nesting? - mysql

Related

SQL SELECT looking for a name like a substring in a join

Converting a Join Query into a Sub Query

MYSQL using count(1) in the where clause?

Find all rows with no matching result SQL

SELECT * WHERE NOT EXISTS

Categories

Resources