Table:
+--------+----------+-----------+------------+----------+-----------------+---------+--------------------+--------------------+
| emp_id | fname | lname | start_date | end_date | superior_emp_id | dept_id | title | assigned_branch_id |
+--------+----------+-----------+------------+----------+-----------------+---------+--------------------+--------------------+
| 1 | Michael | Smith | 2005-06-22 | NULL | NULL | 3 | President | 1 |
| 2 | Susan | Barker | 2006-09-12 | NULL | 1 | 3 | Vice President | 1 |
| 3 | Robert | Tyler | 2005-02-09 | NULL | 1 | 3 | Treasurer | 1 |
| 4 | Susan | Hawthorne | 2006-04-24 | NULL | 3 | 1 | Operations Manager | 1 |
| 5 | John | Gooding | 2007-11-14 | NULL | 4 | 2 | Loan Manager | 1 |
| 6 | Helen | Fleming | 2008-03-17 | NULL | 4 | 1 | Head Teller | 1 |
| 7 | Chris | Tucker | 2008-09-15 | NULL | 6 | 1 | Teller | 1 |
| 8 | Sarah | Parker | 2006-12-02 | NULL | 6 | 1 | Teller | 1 |
| 9 | Jane | Grossman | 2006-05-03 | NULL | 6 | 1 | Teller | 1 |
| 10 | Paula | Roberts | 2006-07-27 | NULL | 4 | 1 | Head Teller | 2 |
| 11 | Thomas | Ziegler | 2004-10-23 | NULL | 10 | 1 | Teller | 2 |
| 12 | Samantha | Jameson | 2007-01-08 | NULL | 10 | 1 | Teller | 2 |
| 13 | John | Blake | 2004-05-11 | NULL | 4 | 1 | Head Teller | 3 |
| 14 | Cindy | Mason | 2006-08-09 | NULL | 13 | 1 | Teller | 3 |
| 15 | Frank | Portman | 2007-04-01 | NULL | 13 | 1 | Teller | 3 |
| 16 | Theresa | Markham | 2005-03-15 | NULL | 4 | 1 | Head Teller | 4 |
| 17 | Beth | Fowler | 2006-06-29 | NULL | 16 | 1 | Teller | 4 |
| 18 | Rick | Tulman | 2006-12-12 | NULL | 16 | 1 | Teller | 4 |
+--------+----------+-----------+------------+----------+-----------------+---------+--------------------+--------------------
Query using subquery:
-- Select employees that do not manage others
SELECT
emp_id,
fname,
lname,
title
FROM
employee
WHERE
emp_id NOT IN (
SELECT
superior_emp_id
FROM
employee
WHERE
superior_emp_id IS NOT NUll
);
Result:
+--------+----------+----------+----------------+
| emp_id | fname | lname | title |
+--------+----------+----------+----------------+
| 2 | Susan | Barker | Vice President |
| 5 | John | Gooding | Loan Manager |
| 7 | Chris | Tucker | Teller |
| 8 | Sarah | Parker | Teller |
| 9 | Jane | Grossman | Teller |
| 11 | Thomas | Ziegler | Teller |
| 12 | Samantha | Jameson | Teller |
| 14 | Cindy | Mason | Teller |
| 15 | Frank | Portman | Teller |
| 17 | Beth | Fowler | Teller |
| 18 | Rick | Tulman | Teller |
+--------+----------+----------+----------------+
The above query works fine but I am curious as to how to accomplish the same result using a join.
So far here is what I have but does not return the same results:
SELECT
e1.emp_id,
e1.fname,
e1.lname,
e1.title
FROM
employee e1
JOIN
employee e2 ON e1.emp_id != e2.superior_emp_id
GROUP BY
e1.emp_id;
Results:
+--------+----------+-----------+--------------------+
| emp_id | fname | lname | title |
+--------+----------+-----------+--------------------+
| 1 | Michael | Smith | President |
| 2 | Susan | Barker | Vice President |
| 3 | Robert | Tyler | Treasurer |
| 4 | Susan | Hawthorne | Operations Manager |
| 5 | John | Gooding | Loan Manager |
| 6 | Helen | Fleming | Head Teller |
| 7 | Chris | Tucker | Teller |
| 8 | Sarah | Parker | Teller |
| 9 | Jane | Grossman | Teller |
| 10 | Paula | Roberts | Head Teller |
| 11 | Thomas | Ziegler | Teller |
| 12 | Samantha | Jameson | Teller |
| 13 | John | Blake | Head Teller |
| 14 | Cindy | Mason | Teller |
| 15 | Frank | Portman | Teller |
| 16 | Theresa | Markham | Head Teller |
| 17 | Beth | Fowler | Teller |
| 18 | Rick | Tulman | Teller |
+--------+----------+-----------+--------------------+
The equivalent query uses a left join and where:
SELECT e.*
FROM employee e LEFT JOIN
employee es
ON es.superior_emp_id = e.id
WHERE es.id IS NULL;
This looks for a match on the superior_emp_id. However, you specifically don't want a match -- hence the LEFT JOIN and the WHERE which filters out any matches.
This would normally have better performance than your version. A comparable version would use NOT EXISTS.
Related
I'm trying to combine some NBA data, box scores and advanced player stats. The query I've got at the moment is:
select boxes.GNO, boxes.NAME, boxes.DATE, advstat.YEAR, advstat.NAME, advstat.AGE
from boxes left join advstat on boxes.NAME=advstat.NAME
group by boxes.NAME, boxes.GNO having boxes.GNO = 1;
boxes.GNO=1 is just to limit the data returned at this point, ultimately I'll be returning data for all games in the boxes table.
With the query above I get the following output:
+------+------------------+------------+------+------------------+------+
| GNO | NAME | DATE | YEAR | NAME | AGE |
+------+------------------+------------+------+------------------+------+
| 1 | Al Horford | 2017-10-17 | 2008 | Al Horford | 21 |
| 1 | Aron Baynes | 2017-10-17 | 2013 | Aron Baynes | 26 |
| 1 | Derrick Rose | 2017-10-17 | 2009 | Derrick Rose | 20 |
| 1 | Dwyane Wade | 2017-10-17 | 2004 | Dwyane Wade | 22 |
| 1 | Gordon Hayward | 2017-10-17 | 2011 | Gordon Hayward | 20 |
| 1 | Iman Shumpert | 2017-10-17 | 2012 | Iman Shumpert | 21 |
| 1 | Jae Crowder | 2017-10-17 | 2013 | Jae Crowder | 22 |
| 1 | Jaylen Brown | 2017-10-17 | 2017 | Jaylen Brown | 20 |
| 1 | Jayson Tatum | 2017-10-17 | NULL | NULL | NULL |
| 1 | Jeff Green | 2017-10-17 | 2008 | Jeff Green | 21 |
| 1 | JR Smith | 2017-10-17 | NULL | NULL | NULL |
| 1 | Kevin Love | 2017-10-17 | 2009 | Kevin Love | 20 |
| 1 | Kyle Korver | 2017-10-17 | 2004 | Kyle Korver | 22 |
| 1 | Kyrie Irving | 2017-10-17 | 2012 | Kyrie Irving | 19 |
| 1 | LeBron James | 2017-10-17 | 2004 | LeBron James | 19 |
| 1 | Marcus Smart | 2017-10-17 | 2015 | Marcus Smart | 20 |
| 1 | Semi Ojeleye | 2017-10-17 | NULL | NULL | NULL |
| 1 | Shane Larkin | 2017-10-17 | 2014 | Shane Larkin | 21 |
| 1 | Terry Rozier | 2017-10-17 | 2016 | Terry Rozier | 21 |
| 1 | Tristan Thompson | 2017-10-17 | 2012 | Tristan Thompson | 20 |
+------+------------------+------------+------+------------------+------+
This is almost right.. however I need advstat.YEAR to match year(boxes.DATE). If I add..
where year(boxes.DATE)=advstat.YEAR, the NULL data is excluded, which is not what I want. I need the table to look like it does above, only with the years lining up correctly between the 2 tables.
Any help will be greatly appreciated! Cheers!
Is this what you want?
select b.GNO, b.NAME, b.DATE, s.YEAR, s.NAME, s.AGE
from boxes b left join
advstat s
on b.NAME = s.NAME and year(b.date) = s.year
where b.GNO = 1
group by b.NAME, b.GNO ;
Note that filtering before the group by is usually more efficient, so I recommend where instead of having.
I have an employee table:
+--------+----------+-----------+------------+----------+-----------------+---------+--------------------+--------------------+
| emp_id | fname | lname | start_date | end_date | superior_emp_id | dept_id | title | assigned_branch_id |
+--------+----------+-----------+------------+----------+-----------------+---------+--------------------+--------------------+
| 1 | Michael | Smith | 2005-06-22 | NULL | NULL | 3 | President | 1 |
| 2 | Susan | Barker | 2006-09-12 | NULL | 1 | 3 | Vice President | 1 |
| 3 | Robert | Tyler | 2005-02-09 | NULL | 1 | 3 | Treasurer | 1 |
| 4 | Susan | Hawthorne | 2006-04-24 | NULL | 3 | 1 | Operations Manager | 1 |
| 5 | John | Gooding | 2007-11-14 | NULL | 4 | 2 | Loan Manager | 1 |
| 6 | Helen | Fleming | 2008-03-17 | NULL | 4 | 1 | Head Teller | 1 |
| 7 | Chris | Tucker | 2008-09-15 | NULL | 6 | 1 | Teller | 1 |
| 8 | Sarah | Parker | 2006-12-02 | NULL | 6 | 1 | Teller | 1 |
| 9 | Jane | Grossman | 2006-05-03 | NULL | 6 | 1 | Teller | 1 |
| 10 | Paula | Roberts | 2006-07-27 | NULL | 4 | 1 | Head Teller | 2 |
| 11 | Thomas | Ziegler | 2004-10-23 | NULL | 10 | 1 | Teller | 2 |
| 12 | Samantha | Jameson | 2007-01-08 | NULL | 10 | 1 | Teller | 2 |
| 13 | John | Blake | 2004-05-11 | NULL | 4 | 1 | Head Teller | 3 |
| 14 | Cindy | Mason | 2006-08-09 | NULL | 13 | 1 | Teller | 3 |
| 15 | Frank | Portman | 2007-04-01 | NULL | 13 | 1 | Teller | 3 |
| 16 | Theresa | Markham | 2005-03-15 | NULL | 4 | 1 | Head Teller | 4 |
| 17 | Beth | Fowler | 2006-06-29 | NULL | 16 | 1 | Teller | 4 |
| 18 | Rick | Tulman | 2006-12-12 | NULL | 16 | 1 | Teller | 4 |
+--------+----------+-----------+------------+----------+-----------------+---------+--------------------+--------------------+
If I do
SELECT emp_id, fname, lname, title
FROM employee
WHERE emp_id IN (
SELECT superior_emp_id
FROM employee
);
I get:
+--------+---------+-----------+--------------------+
| emp_id | fname | lname | title |
+--------+---------+-----------+--------------------+
| 1 | Michael | Smith | President |
| 3 | Robert | Tyler | Treasurer |
| 4 | Susan | Hawthorne | Operations Manager |
| 6 | Helen | Fleming | Head Teller |
| 10 | Paula | Roberts | Head Teller |
| 13 | John | Blake | Head Teller |
| 16 | Theresa | Markham | Head Teller |
+--------+---------+-----------+--------------------+
There is a NULL value in the superior_emp_id column.
If the IN operator is equivalent to field=val1 OR field=val2 OR field=val3 OR field=null why does this query not fail or give some error?
It works just fine. You may be confusing in and not in.
If you run not in, then you will get no results. Why? Well, consider 1 not in (2, 3). That evaluates to true. No-brainer.
Then 2 not in (2, 3). That evaluates to false. Just as it should.
But . . . what about these two:
2 not in (2, 3, NULL)
1 not in (2, 3, NULL)
The first is false, because "2" is indeed in the list. The second is . . . well, NULL is not a value. It means "unknown". So, it could be "1" or something else. Hence, it evaluates to NULL. And NULL is treated the same as false in a where.
You can work it out, but this does not occur with in. For instance, in your case, NULL or 1=1 evaluates to true.
For this reason, I strongly recommend always using not exists rather than not in with a subquery. A corollary is that I recommend exists rather than in, but just so the habit of using exists is engrained.
x=NULL predicate evaluates to UNKNOWN
With regards to the OR operator:
TRUE OR UNKNOWN = TRUE
FALSE OR UNKNOWN = UNKNOWN (which is treated as a FALSE in the WHERE clause)
see: https://en.wikipedia.org/wiki/Three-valued_logic
I am learning about SQL subqueries. Here is the subquery I am using from the book:
SELECT account_id, product_cd, cust_id
FROM account
WHERE open_branch_id = (
SELECT branch_id
FROM branch
WHERE name = 'Woburn Branch'
) AND open_emp_id IN (
SELECT emp_id
FROM employee
WHERE title = 'Teller' OR title = 'Head Teller'
);
Result:
+------------+------------+---------+
| account_id | product_cd | cust_id |
+------------+------------+---------+
| 1 | CHK | 1 |
| 2 | SAV | 1 |
| 3 | CD | 1 |
| 4 | CHK | 2 |
| 5 | SAV | 2 |
| 17 | CD | 7 |
| 27 | BUS | 11 |
+------------+------------+---------+
I have looked over this query trying to interpret it and understand it as well as the reasoning behind it's clauses but, I fail to understand the reason for the last AND clause AND open_emp_id IN...
I noticed that with just
SELECT account_id, product_cd, cust_id
FROM account
WHERE open_branch_id = (
SELECT branch_id
FROM branch
WHERE name = 'Woburn Branch'
)
You get the same result as above. Can anyone explain to me the reasoning behind the last AND open_emp_id IN clause and how omitting it would affect the final result?
Tables used in subquery:
Account table
+------------+------------+---------+------------+------------+--------------------+--------+----------------+-------------+---------------+-----------------+
| account_id | product_cd | cust_id | open_date | close_date | last_activity_date | status | open_branch_id | open_emp_id | avail_balance | pending_balance |
+------------+------------+---------+------------+------------+--------------------+--------+----------------+-------------+---------------+-----------------+
| 1 | CHK | 1 | 2000-01-15 | NULL | 2005-01-04 | ACTIVE | 2 | 10 | 1057.75 | 1057.75 |
| 2 | SAV | 1 | 2000-01-15 | NULL | 2004-12-19 | ACTIVE | 2 | 10 | 500.00 | 500.00 |
| 3 | CD | 1 | 2004-06-30 | NULL | 2004-06-30 | ACTIVE | 2 | 10 | 3000.00 | 3000.00 |
| 4 | CHK | 2 | 2001-03-12 | NULL | 2004-12-27 | ACTIVE | 2 | 10 | 2258.02 | 2258.02 |
| 5 | SAV | 2 | 2001-03-12 | NULL | 2004-12-11 | ACTIVE | 2 | 10 | 200.00 | 200.00 |
| 7 | CHK | 3 | 2002-11-23 | NULL | 2004-11-30 | ACTIVE | 3 | 13 | 1057.75 | 1057.75 |
| 8 | MM | 3 | 2002-12-15 | NULL | 2004-12-05 | ACTIVE | 3 | 13 | 2212.50 | 2212.50 |
| 10 | CHK | 4 | 2003-09-12 | NULL | 2005-01-03 | ACTIVE | 1 | 1 | 534.12 | 534.12 |
| 11 | SAV | 4 | 2000-01-15 | NULL | 2004-10-24 | ACTIVE | 1 | 1 | 767.77 | 767.77 |
| 12 | MM | 4 | 2004-09-30 | NULL | 2004-11-11 | ACTIVE | 1 | 1 | 5487.09 | 5487.09 |
| 13 | CHK | 5 | 2004-01-27 | NULL | 2005-01-05 | ACTIVE | 4 | 16 | 2237.97 | 2897.97 |
| 14 | CHK | 6 | 2002-08-24 | NULL | 2004-11-29 | ACTIVE | 1 | 1 | 122.37 | 122.37 |
| 15 | CD | 6 | 2004-12-28 | NULL | 2004-12-28 | ACTIVE | 1 | 1 | 10000.00 | 10000.00 |
| 17 | CD | 7 | 2004-01-12 | NULL | 2004-01-12 | ACTIVE | 2 | 10 | 5000.00 | 5000.00 |
| 18 | CHK | 8 | 2001-05-23 | NULL | 2005-01-03 | ACTIVE | 4 | 16 | 3487.19 | 3487.19 |
| 19 | SAV | 8 | 2001-05-23 | NULL | 2004-10-12 | ACTIVE | 4 | 16 | 387.99 | 387.99 |
| 21 | CHK | 9 | 2003-07-30 | NULL | 2004-12-15 | ACTIVE | 1 | 1 | 125.67 | 125.67 |
| 22 | MM | 9 | 2004-10-28 | NULL | 2004-10-28 | ACTIVE | 1 | 1 | 9345.55 | 9845.55 |
| 23 | CD | 9 | 2004-06-30 | NULL | 2004-06-30 | ACTIVE | 1 | 1 | 1500.00 | 1500.00 |
| 24 | CHK | 10 | 2002-09-30 | NULL | 2004-12-15 | ACTIVE | 4 | 16 | 23575.12 | 23575.12 |
| 25 | BUS | 10 | 2002-10-01 | NULL | 2004-08-28 | ACTIVE | 4 | 16 | 0.00 | 0.00 |
| 27 | BUS | 11 | 2004-03-22 | NULL | 2004-11-14 | ACTIVE | 2 | 10 | 9345.55 | 9345.55 |
| 28 | CHK | 12 | 2003-07-30 | NULL | 2004-12-15 | ACTIVE | 4 | 16 | 38552.05 | 38552.05 |
| 29 | SBL | 13 | 2004-02-22 | NULL | 2004-12-17 | ACTIVE | 3 | 13 | 50000.00 | 50000.00 |
+------------+------------+---------+------------+------------+--------------------+--------+----------------+-------------+---------------+-----------------+
Branch table:
+-----------+---------------+----------------------+---------+-------+-------+
| branch_id | name | address | city | state | zip |
+-----------+---------------+----------------------+---------+-------+-------+
| 1 | Headquarters | 3882 Main St. | Waltham | MA | 02451 |
| 2 | Woburn Branch | 422 Maple St. | Woburn | MA | 01801 |
| 3 | Quincy Branch | 125 Presidential Way | Quincy | MA | 02169 |
| 4 | So. NH Branch | 378 Maynard Ln. | Salem | NH | 03079 |
+-----------+---------------+----------------------+---------+-------+-------+
Employee table:
+--------+----------+-----------+------------+----------+-----------------+---------+--------------------+--------------------+
| emp_id | fname | lname | start_date | end_date | superior_emp_id | dept_id | title | assigned_branch_id |
+--------+----------+-----------+------------+----------+-----------------+---------+--------------------+--------------------+
| 1 | Michael | Smith | 2005-06-22 | NULL | NULL | 3 | President | 1 |
| 2 | Susan | Barker | 2006-09-12 | NULL | 1 | 3 | Vice President | 1 |
| 3 | Robert | Tyler | 2005-02-09 | NULL | 1 | 3 | Treasurer | 1 |
| 4 | Susan | Hawthorne | 2006-04-24 | NULL | 3 | 1 | Operations Manager | 1 |
| 5 | John | Gooding | 2007-11-14 | NULL | 4 | 2 | Loan Manager | 1 |
| 6 | Helen | Fleming | 2008-03-17 | NULL | 4 | 1 | Head Teller | 1 |
| 7 | Chris | Tucker | 2008-09-15 | NULL | 6 | 1 | Teller | 1 |
| 8 | Sarah | Parker | 2006-12-02 | NULL | 6 | 1 | Teller | 1 |
| 9 | Jane | Grossman | 2006-05-03 | NULL | 6 | 1 | Teller | 1 |
| 10 | Paula | Roberts | 2006-07-27 | NULL | 4 | 1 | Head Teller | 2 |
| 11 | Thomas | Ziegler | 2004-10-23 | NULL | 10 | 1 | Teller | 2 |
| 12 | Samantha | Jameson | 2007-01-08 | NULL | 10 | 1 | Teller | 2 |
| 13 | John | Blake | 2004-05-11 | NULL | 4 | 1 | Head Teller | 3 |
| 14 | Cindy | Mason | 2006-08-09 | NULL | 13 | 1 | Teller | 3 |
| 15 | Frank | Portman | 2007-04-01 | NULL | 13 | 1 | Teller | 3 |
| 16 | Theresa | Markham | 2005-03-15 | NULL | 4 | 1 | Head Teller | 4 |
| 17 | Beth | Fowler | 2006-06-29 | NULL | 16 | 1 | Teller | 4 |
| 18 | Rick | Tulman | 2006-12-12 | NULL | 16 | 1 | Teller | 4 |
+--------+----------+-----------+------------+----------+-----------------+---------+--------------------+--------------------+
The AND open_emp_id IN clause is selecting only the employee ids whose title is 'Teller' or 'Head Teller'. In your case the result is not different because you are looking for branch with the name 'Woburn Branch', and it is just a coincidence that all the employees there are of title 'Teller' or 'Head Teller' only.
| 10 | Paula | Roberts | 2006-07-27 | NULL | 4 | 1 | Head Teller | 2 |
| 11 | Thomas | Ziegler | 2004-10-23 | NULL | 10 | 1 | Teller | 2 |
| 12 | Samantha | Jameson | 2007-01-08 | NULL | 10 | 1 | Teller | 2 |
Change the branch name to 'Headquarters' in the first query, and you will see the difference that the subquery makes. Any employees at the 'Headquarters' who do not have title 'Teller' or 'Head Teller' will get excluded.
The last AND narrows the selection criteria to include only accounts that were open by a Teller or Head Teller. If the Loan Manager or Operations Manager had opened the account it would be excluded.
Employee table:
+--------+----------+-----------+------------+----------+-----------------+---------+--------------------+--------------------+
| emp_id | fname | lname | start_date | end_date | superior_emp_id | dept_id | title | assigned_branch_id |
+--------+----------+-----------+------------+----------+-----------------+---------+--------------------+--------------------+
| 1 | Michael | Smith | 2005-06-22 | NULL | NULL | 3 | President | 1 |
| 2 | Susan | Barker | 2006-09-12 | NULL | 1 | 3 | Vice President | 1 |
| 3 | Robert | Tyler | 2005-02-09 | NULL | 1 | 3 | Treasurer | 1 |
| 4 | Susan | Hawthorne | 2006-04-24 | NULL | 3 | 1 | Operations Manager | 1 |
| 5 | John | Gooding | 2007-11-14 | NULL | 4 | 2 | Loan Manager | 1 |
| 6 | Helen | Fleming | 2008-03-17 | NULL | 4 | 1 | Head Teller | 1 |
| 7 | Chris | Tucker | 2008-09-15 | NULL | 6 | 1 | Teller | 1 |
| 8 | Sarah | Parker | 2006-12-02 | NULL | 6 | 1 | Teller | 1 |
| 9 | Jane | Grossman | 2006-05-03 | NULL | 6 | 1 | Teller | 1 |
| 10 | Paula | Roberts | 2006-07-27 | NULL | 4 | 1 | Head Teller | 2 |
| 11 | Thomas | Ziegler | 2004-10-23 | NULL | 10 | 1 | Teller | 2 |
| 12 | Samantha | Jameson | 2007-01-08 | NULL | 10 | 1 | Teller | 2 |
| 13 | John | Blake | 2004-05-11 | NULL | 4 | 1 | Head Teller | 3 |
| 14 | Cindy | Mason | 2006-08-09 | NULL | 13 | 1 | Teller | 3 |
| 15 | Frank | Portman | 2007-04-01 | NULL | 13 | 1 | Teller | 3 |
| 16 | Theresa | Markham | 2005-03-15 | NULL | 4 | 1 | Head Teller | 4 |
| 17 | Beth | Fowler | 2006-06-29 | NULL | 16 | 1 | Teller | 4 |
| 18 | Rick | Tulman | 2006-12-12 | NULL | 16 | 1 | Teller | 4 |
+--------+----------+-----------+------------+----------+-----------------+---------+--------------------+--------------------+
Query:
SELECT emp_id, fname, lname, title
FROM employee
WHERE emp_id IN (SELECT superior_emp_id FROM employee);
Query Result:
+--------+---------+-----------+--------------------+
| emp_id | fname | lname | title |
+--------+---------+-----------+--------------------+
| 1 | Michael | Smith | President |
| 3 | Robert | Tyler | Treasurer |
| 4 | Susan | Hawthorne | Operations Manager |
| 6 | Helen | Fleming | Head Teller |
| 10 | Paula | Roberts | Head Teller |
| 13 | John | Blake | Head Teller |
| 16 | Theresa | Markham | Head Teller |
+--------+---------+-----------+--------------------+
Subquery result:
+-----------------+
| superior_emp_id |
+-----------------+
| NULL |
| 1 |
| 1 |
| 3 |
| 4 |
| 4 |
| 4 |
| 4 |
| 4 |
| 6 |
| 6 |
| 6 |
| 10 |
| 10 |
| 13 |
| 13 |
| 16 |
| 16 |
+-----------------+
If the subquery SELECT superior_emp_id FROM employee returns NULL for Michael Smith how is it that the IN() operator returns it in the final result set? I thought nothing was equal to null.
If the subquery SELECT superior_emp_id FROM employee returns NULL for Michael Smith how is it that the IN() operator returns it in the final result set?
Short answer, it doesn't.
The subquery effectively returns the whole set of superior_emp_ids [NULL, 1, 1, 3, 3, 4, 4, 6, 6, 6, 4, 10, 10, 4, 13, 13, 4, 16, 16] for each row.
Your WHERE clause tests each emp_id to see if it is IN this set. And IN is basically a series of equals comparisons OR'd together.
Michael's emp_id is 1 and his row is returned because 1 = NULL OR 1 = 1 .... which can be written as FALSE OR TRUE .... returns TRUE.
You are correct in assuming that NULL doesn't equal anything, including NULL, so WHERE NULL IN (NULL, 1, FALSE, ... anything you like ...) will return FALSE. But that is not what's happening in your example.
N.B. To avoid any confusion it is much better to avoid NULL records on either side of an IN clause where possible as referenced by #Donal
SELECT superior_emp_id FROM employee
returns [NULL, 1, 3, 4, 6, 10, 13, 16]. I do not see the problem here.
Have a look at the ANSI_NULLS setting in SQL Server.
Transact-SQL supports an extension that allows for the comparison
operators to return TRUE or FALSE when comparing against null values.
This option is activated by setting ANSI_NULLS OFF. When ANSI_NULLS is
OFF, comparisons such as ColumnA = NULL return TRUE when ColumnA
contains a null value and FALSE when ColumnA contains some value
besides NULL.
Taken from here.
If you don't want the NULL value you will need to add a WHERE clause to the sub query. For example:
SELECT emp_id, fname, lname, title
FROM employee
WHERE emp_id IN (SELECT superior_emp_id FROM employee WHERE superior_emp_id IS NOT NULL);
Employee table:
+--------+----------+-----------+------------+----------+-----------------+---------+--------------------+--------------------+
| emp_id | fname | lname | start_date | end_date | superior_emp_id | dept_id | title | assigned_branch_id |
+--------+----------+-----------+------------+----------+-----------------+---------+--------------------+--------------------+
| 1 | Michael | Smith | 2005-06-22 | NULL | NULL | 3 | President | 1 |
| 2 | Susan | Barker | 2006-09-12 | NULL | 1 | 3 | Vice President | 1 |
| 3 | Robert | Tyler | 2005-02-09 | NULL | 1 | 3 | Treasurer | 1 |
| 4 | Susan | Hawthorne | 2006-04-24 | NULL | 3 | 1 | Operations Manager | 1 |
| 5 | John | Gooding | 2007-11-14 | NULL | 4 | 2 | Loan Manager | 1 |
| 6 | Helen | Fleming | 2008-03-17 | NULL | 4 | 1 | Head Teller | 1 |
| 7 | Chris | Tucker | 2008-09-15 | NULL | 6 | 1 | Teller | 1 |
| 8 | Sarah | Parker | 2006-12-02 | NULL | 6 | 1 | Teller | 1 |
| 9 | Jane | Grossman | 2006-05-03 | NULL | 6 | 1 | Teller | 1 |
| 10 | Paula | Roberts | 2006-07-27 | NULL | 4 | 1 | Head Teller | 2 |
| 11 | Thomas | Ziegler | 2004-10-23 | NULL | 10 | 1 | Teller | 2 |
| 12 | Samantha | Jameson | 2007-01-08 | NULL | 10 | 1 | Teller | 2 |
| 13 | John | Blake | 2004-05-11 | NULL | 4 | 1 | Head Teller | 3 |
| 14 | Cindy | Mason | 2006-08-09 | NULL | 13 | 1 | Teller | 3 |
| 15 | Frank | Portman | 2007-04-01 | NULL | 13 | 1 | Teller | 3 |
| 16 | Theresa | Markham | 2005-03-15 | NULL | 4 | 1 | Head Teller | 4 |
| 17 | Beth | Fowler | 2006-06-29 | NULL | 16 | 1 | Teller | 4 |
| 18 | Rick | Tulman | 2006-12-12 | NULL | 16 | 1 | Teller | 4 |
+--------+----------+-----------+------------+----------+-----------------+---------+--------------------+--------------------+
Query:
SELECT e1.fname, e1.lname,
'VS' AS vs,
e2.fname, e2.lname
FROM employee e1
INNER JOIN employee e2
ON e1.emp_id < e2.emp_id
WHERE e1.title = "Teller" AND e2.title = "Teller";
Result:
+----------+----------+----+----------+----------+
| fname | lname | vs | fname | lname |
+----------+----------+----+----------+----------+
| Chris | Tucker | VS | Sarah | Parker |
| Chris | Tucker | VS | Jane | Grossman |
| Chris | Tucker | VS | Thomas | Ziegler |
| Chris | Tucker | VS | Samantha | Jameson |
| Chris | Tucker | VS | Cindy | Mason |
| Chris | Tucker | VS | Frank | Portman |
| Chris | Tucker | VS | Beth | Fowler |
| Chris | Tucker | VS | Rick | Tulman |
| Sarah | Parker | VS | Jane | Grossman |
| Sarah | Parker | VS | Thomas | Ziegler |
| Sarah | Parker | VS | Samantha | Jameson |
| Sarah | Parker | VS | Cindy | Mason |
| Sarah | Parker | VS | Frank | Portman |
| Sarah | Parker | VS | Beth | Fowler |
| Sarah | Parker | VS | Rick | Tulman |
| Jane | Grossman | VS | Thomas | Ziegler |
| Jane | Grossman | VS | Samantha | Jameson |
| Jane | Grossman | VS | Cindy | Mason |
| Jane | Grossman | VS | Frank | Portman |
| Jane | Grossman | VS | Beth | Fowler |
| Jane | Grossman | VS | Rick | Tulman |
| Thomas | Ziegler | VS | Samantha | Jameson |
| Thomas | Ziegler | VS | Cindy | Mason |
| Thomas | Ziegler | VS | Frank | Portman |
| Thomas | Ziegler | VS | Beth | Fowler |
| Thomas | Ziegler | VS | Rick | Tulman |
| Samantha | Jameson | VS | Cindy | Mason |
| Samantha | Jameson | VS | Frank | Portman |
| Samantha | Jameson | VS | Beth | Fowler |
| Samantha | Jameson | VS | Rick | Tulman |
| Cindy | Mason | VS | Frank | Portman |
| Cindy | Mason | VS | Beth | Fowler |
| Cindy | Mason | VS | Rick | Tulman |
| Frank | Portman | VS | Beth | Fowler |
| Frank | Portman | VS | Rick | Tulman |
| Beth | Fowler | VS | Rick | Tulman |
+----------+----------+----+----------+----------+
Intention:
I wanted to pair up people together as if employees were playing a chess game, where one employee will play against only one person in the first round not 3 or 4 different people. For example I don't want Chris Tucker playing against Sarah, Jane, Thomas... etc in the first round. I want him paired up with only one opponent. If he or his adversary looses they are out.
How can I do this?
I would approach this by randomly enumerating the names and then choosing them in pairs:
select max(case when seqnum % 2 = 0 then fname end) as fname_1,
max(case when seqnum % 2 = 0 then lname end) as lname_1,
max(case when seqnum % 2 = 1 then fname end) as fname_2,
max(case when seqnum % 2 = 1 then lname end) as lname_2
from (select e.*, (#rn := #rn + 1) as seqnum
from employee e cross join
(select #rn := 0) params
order by rand()
) e
group by floor((seqnum - 1) / 2);
This assumes that you have an even number of employees, which seems to be implicitly assumed as part of the problem.