I don't understand the concept of a left outer join, a right outer join, or indeed why we need to use a join at all! The question I am struggling with and the table I am working from is here: Link
Question 3(b)
Construct a command in SQL to solve the following query, explaining why it had to employ the
(outer) join method. [5 Marks]
“Find the name of each staff member and his/her dependent spouse, if any”
Question 3(c) -
Construct a command in SQL to solve the following query, using (i) the join method, and (ii) the
subquery method. [10 Marks]
“Find the identity name of each staff member who has worked more than 20 hours on the
Computerization Project”
Can anyone please explain this to me simply?
Joins are used to combine two related tables together.
In your example, you can combine the Employee table and the Department table, like so:
SELECT FNAME, LNAME, DNAME
FROM
EMPLOYEE INNER JOIN DEPARTMENT ON EMPLOYEE.DNO=DEPARTMENT.DNUMBER
This would result in a recordset like:
FNAME LNAME DNAME
----- ----- -----
John Smith Research
John Doe Administration
I used an INNER JOIN above. INNER JOINs combine two tables so that only records with matches in both tables are displayed, and they are joined in this case, on the department number (field DNO in Employee, DNUMBER in Department table).
LEFT JOINs allow you to combine two tables when you have records in the first table but might not have records in the second table. For example, let's say you want a list of all the employees, plus any dependents:
SELECT EMPLOYEE.FNAME as employee_first, EMPLOYEE.LNAME as employee_last, DEPENDENT.FNAME as dependent_last, DEPENDENT.LNAME as dependent_last
FROM
EMPLOYEE INNER JOIN DEPENDENT ON EMPLOYEE.SSN=DEPENDENT.ESSN
The problem here is that if an employee doesn't have a dependent, then their record won't show up at all -- because there's no matching record in the DEPENDENT table.
So, you use a left join which keeps all the data on the "left" (i.e. the first table) and pulls in any matching data on the "right" (the second table):
SELECT EMPLOYEE.FNAME as employee_first, EMPLOYEE.LNAME as employee_last, DEPENDENT.FNAME as dependent_first, DEPENDENT.LNAME as dependent_last
FROM
EMPLOYEE LEFT JOIN DEPENDENT ON EMPLOYEE.SSN=DEPENDENT.ESSN
Now we get all of the employee records. If there is no matching dependent(s) for a given employee, the dependent_first and dependent_last fields will be null.
example (not using your example tables :-)
I have a car rental company.
Table car
id: integer primary key autoincrement
licence_plate: varchar
purchase_date: date
Table customer
id: integer primary key autoincrement
name: varchar
Table rental
id: integer primary key autoincrement
car_id: integer
bike_id: integer
customer_id: integer
rental_date: date
Simple right? I have 10 records for cars because I have 10 cars.
I've been running this business for 10 years, so I've got 1000 customers.
And I rent the cars about 20x per year per cars = 10 years x 10 cars x 20 = 2000 rentals.
If I store everything in one big table I've got 10x1000x2000 = 20 million records.
If I store it in 3 tables I've got 10+1000+2000 = 3010 records.
That's 3 orders of magnitude, so that's why I use 3 tables.
But because I use 3 tables (to save space and time) I have to use joins in order to get the data out again
(at least if I want names and licence plates instead of numbers).
Using inner joins
All rentals for customer 345?
SELECT * FROM customer
INNER JOIN rental on (rental.customer_id = customer.id)
INNER JOIN car on (car.id = rental.car_id)
WHERE customer.id = 345.
That's an INNER JOIN, because we only want to know about cars linked to rentals linked to customers that actually happened.
Notice that we also have a bike_id, linking to the bike table, which is pretty similar to the car table but different.
How would we get all bike + car rentals for customer 345.
We can try and do this
SELECT * FROM customer
INNER JOIN rental on (rental.customer_id = customer.id)
INNER JOIN car on (car.id = rental.car_id)
INNER JOIN bike on (bike.id = rental.bike_id)
WHERE customer.id = 345.
But that will give an empty set!!
This is because a rental can either be a bike_rental OR a car_rental, but not both at the same time.
And the non-working inner join query will only give results for all rentals where we rent out both a bike and a car in the same transaction.
We are trying to get and boolean OR relationship using a boolean AND join.
Using outer joins
In order to solve this we need an outer join.
Let's solve it with left join
SELECT * FROM customer
INNER JOIN rental on (rental.customer_id = customer.id) <<-- link always
LEFT JOIN car on (car.id = rental.car_id) <<-- link half of the time
LEFT JOIN bike on (bike.id = rental.bike_id) <<-- link (other) 0.5 of the time.
WHERE customer.id = 345.
Look at it this way. An inner join is an AND and a left join is a OR as in the following pseudocode:
if a=1 AND a=2 then {this is always false, no result}
if a=1 OR a=2 then {this might be true or not}
If you create the tables and run the query you can see the result.
on terminology
A left join is the same as a left outer join.
A join with no extra prefixes is an inner join
There's also a full outer join. In 25 years of programming I've never used that.
Why Left join
Well there's two tables involved. In the example we linked
customer to rental with an inner join, in an inner join both tables must link so there is no difference between the left:customer table and the right:rental table.
The next link was a left join between left:rental and right:car. On the left side all rows must link and the right side they don't have to. This is why it's a left join
You use outer joins when you need all of the results from one of the join tables, whether there is a matching row in the other table or not.
I think Question 3(b) is confusing because its entire premise wrong: you don't have to use an outer join to "solve the query" e.g. consider this (following the style of syntax in the exam paper is probably wise):
SELECT FNAME, LNAME, DEPENDENT_NAME
FROM EMPLOYEE, DEPENDENT
WHERE SSN = ESSN
AND RELATIONSHIP = 'SPOUSE'
UNION
SELECT FNAME, LNAME, NULL
FROM EMPLOYEE
EXCEPT
SELECT FNAME, LNAME, DEPENDENT_NAME
FROM EMPLOYEE, DEPENDENT
WHERE SSN = ESSN
AND RELATIONSHIP = 'SPOUSE'
In general:
JOIN joints two tables together.
Use INNER JOIN when you wanna "look up", like look up detailed information of any specific column.
Use OUTER JOIN when you wanna "demonstrate", like list all the info of the 2 tables.
Related
You currently are maintaining a contact relation management system for an organization where they keep track of Organizations and Employees belonging to these organizations. You are asked to clean up their data and identify all organizations that don’t have any employees attached to it.
You have two tables, Organizations and Employees. There is a one-to-many relationship from Organizations to Employees. Ie. An Organization can have many employees and an employee belongs to an organization.
Organizations: id Integer name String
Employees: id Integer name String organization_id Integer
Assignment: Write an SQL query that will produce a list of Companies that don’t have any Employees attached to them.
I have tried:
Select Organisation,Organisation.ID, Organisation.Name
From Organisation
Left Join Employees On organisationID = Employees Employees.ID
order by Organisation,Organisation.name
Probably the point of this homework question is to get you using Joins, which are one of the key mechanisms for relating data in one table to the data in another.
Generally SQL has inner and outer joins, the difference being that an inner join requires records in each table to exist for the query to produce a result record. An outer join only requires that one table have an existant record.
A left join (left outer join) requires that the table to the left of the keyword join has records whereas a right join (right outer join) requires that the table to the right of the keyword join has records. A full outer join permits either left or right to have records.
If one side of a join doesn't have a record then the resulting record will have nulls for its column values, so if you have a not-null (such as primary key) column in your table you can always tell whether there was a record by filtering out any not-null values.
For example (assuming your Id columns are defined "not null", which is usually valid and if it isn't valid it's a problem with the design):
select
*
from Organizations
left join Employees on
Organizations.id = Employees.organization_id
where Employees.organization_id is null
;
It fits the Join query where the organization_id will be exist in the Organization table but not exist (null in other word) in the Employee table.
Query: Write an SQL query that will produce a list of Companies that don’t have any Employees attached to them.
SELECT O.id, O.Name
FROM Organization AS O
LEFT JOIN Employees AS E
ON O.id=E.organization_id
WHERE E.organization_id IS NULL
Organizations: id Integer name String
Employees: id Integer name String organization_id Integer
You can use following query to get Organization without employees.
SELECT Org.id,Org.name from Organizations Org
LEFT OUTER JOIN
Employees Emp on Org.id=Emp.orgnization_id where Emp.id is null;
I am trying to list two attributes(booking.roomno and room.price) with the condition that "booking.DATETO is not null" from two different tables.
Table: Booking! Table:Room!
I have tried using this command
select booking.roomno,room.price
from booking
inner join room on booking.ROOMNO=room.roomno
where booking.dateto is not null
although the return results came in with duplicated roomno and price like shown below
room.roomno is not unique. It is only unique within a given hotel and your room table contains multiple hotels. You are going to have to specify hotelno in your join condition as well. Also since you might have multiple bookings for the same room (i.e., duplicates in booking table) you will need to do a DISTINCT to prevent that (but then you have to include the hotelno column in your field list):
select DISTINCT booking.roomno,room.price, room.hotelno
from booking
inner join room on booking.ROOMNO=room.roomno
AND booking.hotelno=room.hotelno
where booking.dateto is not null
You have two bookings for the same room so the returned rows match your inner join. You seem to be trying to fetch all the rooms that have bookings. You would achieve that by adding DISTINCTROW before the selected fields.
select DISTINCTROW booking.hotelno, booking.roomno,room.price
from booking
inner join room on booking.ROOMNO=room.roomno AND
booking.HOTELNO=room.HOTELNO
where booking.dateto is not null
I’ve got 5 Tables:
Patients Table 'PAT'
Patient Orders 'ORD'
Patient Comments 'Comm'
Patient Antigens 'Ant'
Patient MRNs 'MRN'
Information in the following four tables:
Pat has all of our patients information
Ord only has orders from the last four years, joined to the Pat table on Patient ID
Comm has all patient comments, joined on the Pat table on Patient ID
Ant has all antigens entered on patients, joined on the Pat table on Patient ID
MRN has all medical record numbers of patients, joined on the Pat table on Patient ID
What I’m trying to do it make sure I have all patients that appear in the Ord, Comm, Ant, and MRN tables and the join it to the Pat table to get information like first name, last name, sex, DOB...
So I’m trying to figure out how best to join these tables together to make sure I don’t include patients that don’t appear in the Ord, Comm, Ant, or MRN tables.
If I join the tables to my main patient table with left outer joins, I'll get all patients regardless if they have an order, comment, antigen, or MRN.
Select * from Pat
Left Outer Join Ord on Pat.X=Ord.X
Left Outer Join Comm on Pat.X=Comm.X
Left Outer Join Ant on Pat.X=Ant.X
Left Outer Join MRN on Pat.X=MRN.X
I've attempted to make a visual diagram of what I'm looking for:
My ideal query would return all Patients except for 4 and 10 because they do not have any orders, antigens, comments, or a MRN.
I was thinking I could perform FULL OUTER JOINS on the ORD, ANT, COMM, and MRN tables and then take those results and and join it to the PAT table.
Select *
FROM P50DATA.AGABFL1 ANT
FULL OUTER JOIN P50DATA.ORDMSTL1 ORD ON ANT.AGACCT=ORD.OPACCT
FULL OUTER JOIN P50DATA.DCMTRNL5 COMM ON ANT.AGACCT=COMM.DCACCT
FULL OUTER JOIN P50DATA.HOSPIDL1 HOS ON ANT.AGACCT=HOS.APACCT
But I don't know how to then take this data set and marry it to the Pat table so I could then get my patient information.
Thoughts?
This doesn't have to be too complicated.
ORD, ANT, COMM, and MRN don't have to join to each other, because you don't need the records in any one of them to match any other one. You just need to know whether they have records for a given patient or not.
There are many ways to construct such a query; here's one:
SELECT *
FROM PAT
WHERE EXISTS (
SELECT 1
FROM ORD
WHERE ORD.X = PAT.X
)
OR EXISTS (
SELECT 1
FROM ANT
WHERE ANT.X = PAT.X
)
OR EXISTS (
SELECT 1
FROM COMM
WHERE COMM.X = PAT.X
)
OR EXISTS (
SELECT 1
FROM MRN
WHERE MRN.X = PAT.X
)
Because you are querying the Patient table without joining it to anything, you don't have to worry about duplicate records, and because you're just checking existence for ORD, ANT, COMM, and MRN, you don't have to worry about exactly how many records each of them has for a given patient.
I'm not entirely sure that what I'm trying to do is possible. Can you use an OR in the condition of a left join? I start from my users table and then it can either go from week_meal to meal (adding a meal they do not own to their weekly meal plan) or straight to meal (a meal they own). That part appears to be working, but when I include mta.meal_to_add_id in the select, it incorrectly pulls in meals that do NOT meet the criteria in the LEFT JOIN to meal_to_add.
Fiddle with structure: http://sqlfiddle.com/#!2/7bd9c4
SELECT DISTINCT m.*, o.username as owner,i.*, mta.meal_to_add_id, follow_id
FROM webusers wu
LEFT JOIN week_meal wm ON wu.id=wm.user_id
LEFT JOIN meal m ON (wu.id=m.user_id OR wm.meal_id=m.meal_id)
LEFT JOIN webusers o ON m.user_id=o.id
LEFT JOIN meal_to_add mta ON
((wm.user_id = mta.user_id AND wm.meal_id=mta.meal_id)
OR (m.user_id=mta.user_id AND m.meal_id=mta.meal_id))
JOIN ingredient i ON m.meal_id = i.meal_id
LEFT JOIN follow f ON
m.user_id!=wu.id AND
m.user_id=f.followed_webuser_id
AND wu.id=f.followee_webuser_id
WHERE wu.id=5 AND m.meal_id in (138)
ORDER BY m.meal, i.ingredient_id
OUTPUT: It should be just like this only including the field for mta.meal_to_add_id, which in this case should be NULL for all rows (18)
Sample Results
To answer the first part of your question: Yes, you can use an OR clause in a LEFT JOIN.
As for the second part, in plain words, this is what the query seems to say:
Join 'week meals' to users on the user id. Join meals to users on that same user id OR join meals to users on the meal id. Assume now that we have some matching meal/user combinations, where some meal rows are matched on user, and others are matched on the meal id.
Next, join webusers to meals again. Now we have meal rows possibly matching two sets of users. So when mta tries to match meals, it is matching two possible sets of meal/user combinations.
My practice in cases like this is to break up the query into two queries and put the intermediate results in a temp table (using MEMORY engine), then select from that.
So, alright, I have a few tables. My current query runs against a "historical" table. I want to do a join of some kind to get the most recent status from my Current table. These tables share a like column, called "ID"
Here's the structure
ddCurrent
-ID
-Location
-Status
-Time
ddHistorical
-CID (AI field to keep multiple records per site)
-ID
-Location
-Status
-Time
My goal now is to do a simple join to get all the variables from ddHistorical and the current Status from ddCurrent.
I know that they can be joined on ID since both of them have the same items in their ID tables, I just can't figure out which kind of join is appropriate or why?
I'm sure someone may provide a specific link that goes into great detail explaining, but I'll try to summarize it this way. When writing a query, I try to list the tables from the position of what table do I want to get data from and have that as my first table in the "FROM" clause. Then, do "JOIN" criteria to other tables based on relationships (such as IDs). In your example
FROM
ddHistorical ddH
INNER JOIN ddCurrent ddC
on ddH.ID = ddC.ID
In this case, INNER JOIN (same as JOIN) the ddHistorical table is the left table(listed first for my styling consistency and indentation) and ddCurrent is the right table. Notice my ON criteria that joins them together is also left alias.column = right alias table.column -- again, this is just for mental correlation purposes.
an Inner Join (or JOIN) means a record MUST have a match on each side, otherwise it is discarded.
A LEFT JOIN means give me all records in the LEFT table (ddHistorical in this case), regardless of a matching in the right-side table (ddCurrent). Not practical in this example.
A RIGHT JOIN is the reverse... give me all records from the RIGHT-side table REGARDLESS of a matching record in the left side table. Most of the time you will see LEFT-JOINs more frequently than RIGHT-JOINs.
Now, a sample to mentally get the left-join. You work at a car dealership and have a master table of 10 cars that are sold. For a given month, you want to know what IS NOT selling. So, start with the master table of all cars and look at the sales table for what DID sell. If there is NO such sales activity the right-side table will have NULL value
select
M.CarID,
M.CarModel
from
MasterCarsList M
LEFT JOIN CarSales CS
on M.CarID = CS.CarID
AND month( CS.DateSold ) = 4
where
CS.CarID IS NULL
So, my LEFT join is based on a matching car ID -- AND -- the month of sales activity is 4 (April) as I may not care about sales for Jan-Mar -- but would also qualify year too, but this is a simple sample.
If there is no record in the Car Sales table it will have a NULL value for all columns. I just happen to care about the car ID column since that was the join basis. That is why I am including that in the WHERE clause. For all other types of cars that DO have a sale it will have a value.
This is a common approach you will see in querying where someone looking for all regardless of other... Some use a where NOT EXIST ( subselect ), but those perform slower because they test on every record. Having joins is much faster.
Other examples may be you want a list of all employees of a company, and if they had some certification / training to show it... You still want all employees, but LEFT-JOINING to some certification/training table would expose those extra field as needed.
select
Emp.FullName,
Cert.DateCertified
FROM
Employees Emp
Left Join Certifications Cert
on Emp.EmpID = Cert.EmpID
Hopefully these samples help you understand better the relationship for queries, and now to actually provide answer for your needs.
If what you want is a list of all "Current" items and want to look at their historical past, I would use current FIRST. This might be if your current table of things is 50, but historically your table had 420 items. You don't care about the other 360 items, just those that are current and the history of those.
select
ddC.WhateverColumns,
ddH.WhateverHistoricalColumns
from
ddCurrent ddC
JOIN ddHistorical ddH
on ddC.ID = ddH.ID
If there is always a current field then a simple INNER JOIN will do it
SELECT a.CID, a.ID, a.Location, a.Status, a.Time, b.Status
FROM ddHistorical a
INNER JOIN ddCurrent b
ON a.ID = b.ID
An INNER JOIN will omit any ddHistorical rows that don't have a corresponding ID in ddCurrent.
A LEFT JOIN will include all ddHistorical rows, even if they don't have a corresponding ID in ddCurrent, but the ddCurrent values will be null (because they're unknown).
Also note that a LEFT JOIN is just a specific type of outer join. Don't bother with the others yet - 90% or more of what you'll ever do will be INNER or LEFT.
To include only those ddHistorical rows where the ID is in ddCurrent:
SELECT h.CID, h.ID, h.Location, h.Status, c.Status, h.Time
FROM ddHistorical h
INNER JOIN ddCurrent c ON h.ID = c.ID
If you want to include ddHistorical rows even if the ID isn't in ddCurrent:
SELECT h.CID, h.ID, h.Location, h.Status, c.Status, h.Time
FROM ddHistorical h
LEFT JOIN ddCurrent c ON h.ID = c.ID
If all ddHistorical rows happen to match an ID in ddCurrent, note that both queries will return the same result.