SQL Natural Join Dilemma re Order, Customer & Salesman tables - mysql

I was given the following question:
Write a SQL statement to make a join on the tables salesman, customer and orders in such a form that the same column of each table will appear once and only the relational rows will come.
I executed the following query:
SELECT * FROM orders NATURAL JOIN customer NATURAL JOIN salesman;
However, I was not expecting the following result:
My doubt lies in step 2.
Why didn't I get the rows with salesman_id 5002, 5003 & 5007?
I know that natural join uses the common columns to finalize the rows.
Here all the Salesman_ids are present in the result from step 1.
Why isn't the final result equal to the table resulting from step 1 with non duplicate columns from salesman added to it?

... the same column of each table will appear once
Yes Natural Join does that.
... and only the relational rows will come.
I don't know what that means.
I disagree with those who are saying: do not use Natural Join. But it is certainly true that if you plan to use Natural Join for your queries, you must design the schema so that (loosely speaking) 'same column name means same thing'.
Then this exercise is teaching you the dangers of having same-named columns that do not mean the same thing. The danger is sometimes called the 'connection trap' or 'join trap'. (Not really a trap: you just need to learn ways to write queries over poorly-designed schemas.)
A more precise way to put that: if you have columns named the same in two different tables, the column must be a key of at least one of them. So:
city is not a key in any of those tables,
so should not get 'captured' in a Natural Join.
salesman_id is not a key in table customer,
so should not get 'captured' in the join from table orders.
The main way to fix up this query is by renaming some columns to avoid 'capture' (see below). It's also worth mentioning that some dialects of SQL allow:
SELECT *
FROM orders
NATURAL JOIN customer ON customer_id
...
The ON column(s) phrase means: validate that the only columns in common between the two tables are those named. Otherwise reject the query. So your query would be rejected.
Renaming means that you shouldn't use SELECT *. (Anyway, that's dangerous for 'production code' because your query might produce different columns each time there's a schema change.) The easiest way to tackle this might be to create three Views for your three base tables, with the 'accidental' same-named columns given some other name. For this one query:
SELECT ord_no, purch_amt, ord_date, customer_id,
salesman_id AS order_salesman_id
FROM orders
NATURAL JOIN (SELECT customer_id, cust_name,
city AS cust_city, grade,
salesman_id AS cust_salesman_id
FROM customer) AS customer_grr
NATURAL JOIN (SELECT salesman_id, name,
city AS salesman_city,
commission
FROM salesman) AS salesman_grr
I'm using explicit AS to show renaming. Most dialects of SQL allow you to omit that keyword; just put city cust_city, ....

Why isn't the final result equal to the table resulting from step 1 with [...]?
Because natural join doesn't work how you expect--whatever that is, since you don't say.
In terms of relational algebra: Natural join returns the rows
• whose column set is the union of the input column sets and
• that have a subrow in both inputs.
In business terms: Every table & query result holds the rows that make some statement template--its (characteristic) predicate--its "meaning"--into a true statement. The designer gives the predicates for the base tables. Here, something like:
Orders = rows where
order [ord_no] ... and was sold by salesman [salesman_id] to customer [customer_id]
Customer = rows where
customer [customer_id] has name [cust_name] and lives in city [city]
and ... and is served by salesman [salesman_id]
Salesman = rows where
salesman [salesman_id] has name [name] and works in city [city] ...
Natural join is defined so that if each input holds the rows that make its predicate into a true statement then their natural join holds the rows that make the AND/conjunction of those predicates into a true statement. So (your query):
Orders natural join Customer natural join Salesman = rows where
order [ord_no] ... and was sold by salesman [salesman_id] to customer [customer_id]
and customer [customer_id] has name [cust_name] and lives in city [city]
and ... and is served by salesman [salesman_id]
and salesman [salesman_id] has name [name] and works in city [city] ...
So that natural join is asking for rows where, among other things, the customer lives in the city that the salesman works in. If that's not what you want, then you shouldn't use that expression.
Note how the meaning of a natural join of tables is a (simple) function of the meanings of its input tables. That's so for all the relational operators. So every query expression has a meaning, built from its base table meanings & relational operators.
Is there any rule of thumb to construct SQL query from a human-readable description?
Why didn't I get the rows with salesman_id 5002, 5003 & 5007?
Because those salesmen don't work a city in which one of their customers lives.

Related

MySQL - When shouldn't I Join tables? Combinatorial Explosion of values

I am working on a database called classicmodels, which I found at: https://www.mysqltutorial.org/mysql-sample-database.aspx/
I realized that when I executed an Inner Join between 'payments' and 'orders' tables, a 'cartesian explosion' occurred. I understand that these two tables are not meant to be joined. However, I would like to know if it is possible to identify this just by looking at the relational schema or if I should check the tables one by one.
For instance, the customer number '141' appears 26 times in the 'orders table', which I found by using the following code:
SELECT
customerNumber,
COUNT(customerNumber)
FROM
orders
WHERE customerNumber=141
GROUP BY customerNumber;
And the same customer number (141) appears 13 times in the payments table:
SELECT
customerNumber,
COUNT(customerNumber)
FROM
payments
WHERE customerNumber=141
GROUP BY customerNumber;
Finally, I executed an Inner Join between 'payments' and 'orders' tables, and selected only the rows with customer number '141'. MySQL returned 338 rows, which is the result of 26*13. So, my query is multiplying the number of times this 'customer n°' appears in 'orders' table by the number of times it appears in 'payments'.
SELECT
o.customernumber,
py.amount
FROM
customers c
JOIN
orders o ON c.customerNumber=o.customerNumber
JOIN
payments py ON c.customerNumber=py.customerNumber
WHERE o.customernumber=141;
My questions is the following:
1 ) Is there a way to look at the relational schema and identify if a Join can be executed (without generating a combinatorial explosion)? Or should I check table by table to understand how the relationship between them is?
Important Note: I realized that there are two asterisks in the payments table's representation in the relational schema below. Maybe this means that this table has a composite primary key (customerNumber+checkNumber). The problem is that 'checkNumber' does not appear in any other table.
This is the database's relational schema provided by the 'MySQL Tutorial' website:
Thank you for your attention!
This is called "combinatorial explosion" and it happens when rows in one table each join to multiple rows in other tables.
(It's not "overestimation" or any sort of estimation. It's counting data items multiple times when it should only count them once.)
It's a notorious pitfall of summarizing data in one-to-many relationships. In your example each customer may have no orders, one order, or more than one. Independently, they may have no payments, one, or many.
The trick is this: Use subqueries so your toplevel query with GROUP BY avoids joining one-to-many relationships serially. In the query you showed us, that's happening.
You can this subquery to get a resultset with just one row per customer. (try it.)
SELECT customernumber,
SUM(amount) amount
FROM payments
GROUP BY customernumber
Likewise you can get the value of all orders for each customer with this
SELECT c.customernumber,
SUM(od.qytOrdered * od.priceEach) amount
FROM orders o
JOIN orderdetails od ON o.orderNumber = od.orderNumber
GROUP BY c.customernumber
This JOIN won't explode in your face because customer can have multiple orders, and each order can have multiple details. So it's a strict hierarchical rollup.
Now, we can use these subqueries in the main query.
SELECT c.customernumber, p.payments, o.orders
FROM customers c
LEFT JOIN (
SELECT c.customernumber,
SUM(od.qytOrdered * od.priceEach) orders
FROM orders o
JOIN orderdetails od ON o.orderNumber = od.orderNumber
GROUP BY c.customernumber
) o ON c.customernumber = o.customernumber
LEFT JOIN (
SELECT customernumber,
SUM() payment
FROM payments
GROUP BY customernumber
) p on c.customernumber = p.customernumber
Takehome tricks:
A subquery IS a table (a virtual table) that can be used whereever you might mention a table or a view.
The GROUP BY stuff in this query happens separately in two subqueries, so no combinatorial explosions.
All three participants in the toplevel JOIN have either one or zero rows per customernumber.
The LEFT JOINs are there so we can still see customers with (importantly for a business) no orders or no payments. With the ordinary inner JOIN, rows have to match both sides of the ON conditions or they're omitted from the resultset.
Pro tip Format your SQL queries fanatically carefully: They are really verbose. Adm. Grace Hopper would be proud. That means they get quite long and nested, putting the Structured in Structured Query Language. If you, or anybody, is going to reason about them in future, we must be able to grasp the structure easily.
Pro tip 2 The data engineer who designed this database did a really good job thinking it through and documenting it. Aspire to this level of quality. (Rarely reached in the real world.)
In this particular case, your behavior should depend on the accounting style being supported by the database, and this does not appear to be "open item" style accounting ie when an order is raised for 1000 there does not need to be a payment against it for 1000.. This is perhaps unusual in most consumer experience because you will be quite familiar with open item style ordering from Amazon - you buy a 500 dollar tv and a 500 dollar games console, the order is a thousand dollars and you pay for it, the payment going against the order. However, you're also familiar with "balance forward" accounting if you paid for that order using your credit card because you make similar purchases every day for a month and hen you get a statement from your bank saying you owe 31000 and you pay a lump of money, doesn't even have to be 31k. You aren't expected to make 31 payments of 1000 to your bank at the end of the month. Your bank allocate it to the oldest items on the account (if they're nice, or the newest items if they're not) and may eventually charge you interest on unpaid transactions
1 ) Is there a way to look at the relational schema and identify if a Join can be executed
Yes, you can tell looking at the schema- customer has many orders, customer makes many payments, but there is no relation between the order and payment tables at all so we can see there is no attempt to directly attach a payment to an order. You can see that customer is a parent table of payment and order, and therefore enjoys a relationship with each of them but they do not relate to each other. If you had Person, Car and Address tables, a person has many addresses during their life, and many cars but it doesn't mean there is a relationship between cars and addresses
In such a case it simply doesn't make sense to join payments to customers to orders because they do not relate that way. If you want to make such a join and not suffer a Cartesian explosion then you absolutely have to sum one side or the other (or both) to ensure that your joins are 1:1 and 1:M (or 1:1 and 1:1). You cannot arrange a join that is a pair of 1:M.
Going back to the car/person/address example to make any meaningful joins, you have to build more information into the question and arrange the join to create the answer. Perhaps the question is "what cars did they own while they lived at" - this flattens the Person:Address relationship to 1:1 but leaves Person:Car as 1:M so they might have owned many cars during their time in that house. "What was the newest car they owned while living at..." might be 1:1 on both sides if there is a clear winner for "newest" (though if they bought two cars manufactured at identical times...)
Which side you sum in your orders case will depend on what you want to know, but in this case I'd say you usually want to know "which orders haven't been paid for" and that's summing all payments and rolling summing all orders then looking at what point the rolling sum exceeds the sum of payments.. those are the unpaid orders
Take a look again at your database graph (the one that was present in the first iteration of your question). See the lines between tables have 3 angled legs on one end - that's the many end. You can start at any table in the graph and join to other tables by walking along the relationship. If you're going from the many end to the one end, and assuming you've picked out a single row in the start table (a single order) you can always walk to any other table in the many->one direction and not increase your row count. If you walk the other way you potentially increase your row count. If you split and walk two ways that both increase row count you get a Cartesian explosion. Of course, also you don't have to only join on relation lines, but that's out of scope for the question
ps: this is easier to see on the db diagram than the ERD in the question because the database purely concerns itself with the columns that are foreign keyed. The ERD is saying a customer has zero or one payments with a particular check number but the database will only be concerned with "the customer ID appears once in the customer table and multiple times in the payment table" because only part of the compound primary key of payment is keyed to the customer table. In other words, the ERD is concerned with business logic relations too, but the db diagram is purely how tables relate and they aren't necessarily aligned. For this reason the db diagrams are probably easier to read when walking round for join strategies
After seeing the answers of Caius Jard and O.Jones (please, check their replies), which kindly helped me to clarify this doubt, I decided to create a table to identify which customers paid for all orders they made and which ones did not. This creates a pertinent reason to join 'orders', 'orderdetails', 'payments' and 'customers' tables, because some orders may have been cancelled or still may be 'On Hold', as we can see in their corresponding 'status' in the 'orders' table. Also, this enables us to execute this join without generating a 'combinatorial explosion'.
I did this by using the CASE statement, which registers when py.amount and amount_in_orders match, don't match or when they are NULL (customers which did not make orders or payments):
SELECT
c.customerNumber,
py.amount,
amount_in_orders,
CASE
WHEN py.amount=amount_in_orders THEN 'Match'
WHEN py.amount IS NULL AND amount_in_orders IS NULL THEN 'NULL'
ELSE 'Don''t Match'
END AS Match
FROM
customers c
LEFT JOIN(
SELECT
o.customerNumber, SUM(od.quantityOrdered*od.priceEach) AS amount_in_orders
FROM
orders o
JOIN orderdetails od ON o.orderNumber=od.orderNumber
GROUP BY o.customerNumber
) o ON c.customerNumber=o.customerNumber
LEFT JOIN(
SELECT customernumber, SUM(amount) AS amount
FROM payments
GROUP BY customerNumber
) py ON c.customerNumber=py.customerNumber
ORDER BY py.amount DESC;
The query returned 122 rows. The images below are fractions of the generated output, so you can visualize what happened:
For instance, we can see that the customers identified by the numbers '141', '124', '119' and '496' did not pay for all the orders they made. Maybe some of them where cancelled or maybe they simply did not pay for them yet.
And this image shows some of the columns (not all of them) that are NULL:

Inner join in mysql take a long time

I have table contacts with more than 1,000,000 and other table cities which have about 20,000 records. Need to fetch all cities which have used in contacts table.
Contacts table have following columns
Id, name, phone, email, city, state, country, postal, address, manager_Id
cities table have
Id, city
I used Inner join for this, but its taking a long time to go. Query takes more than 2 minutes to execute.
I used this query
SELECT cities.* FROM cities
INNER JOIN contacts ON contacts.City = cities.city
WHERE contacts.manager_Id= 1
created index on manager_Id as well. But still its very slow.
for better performance you could add index
on table cities column city
on table contacts a composite index on columns (manager_id, city)
Filter contacts first and then join to cities:
SELECT ct.*
FROM cities ct INNER JOIN (
SELECT city FROM contacts
WHERE manager_Id = 1
) cn ON cn.city = ct.city
You need indexes for city in both tables and for manager_id in contacts.
As others have pointed out about having proper index, I am taking it a bit more for clarification. You are specifically looking for contacts where the MANAGER ID = 1. This is not expected to be one person, but could be many people. So having the MANAGER ID in the first position will optimize get me all people for that manager. By having the city as part of the index via (manager_id, city), you are pulling the two data elements you need to optimize as part of the index. This way the engine does not have to go to the raw data pages to get the other part of interest.
Now, From that, you want all the city information (hence the join to city table on that ID).
Since you are only querying the CITIES and not the actual contact people information, you probably want to have DISTINCT City ID. Lets say a manager is responsible for 50 people and most of them live in the same city or neighboring. You may have 5 distinct cities? That too will limit your result set of joining.
Having said that, I would do a follows, and with MySQL, using STRAIGHT_JOIN can help optimize by "do the query as I wrote it, don't think for me".
select STRAIGHT_JOIN
cty.*
from
( select distinct c.City
from Contacts c
where c.Manager_ID = 1 ) PQ
JOIN Cities cty
on PQ.City = Cty.City
The "PQ" is an alias representing my "pre-query" of just DISTINCT cities for a given manager.
Again, have one index on Contacts table on (manager_id, city). On the city table, I would expect and index on (city).
You need two indexes, one on each table.
On the contacts table, first index manager_Id, then City
CREATE INDEX idx_contacts_mgr_city ON contacts(manager_Id, City);
On the cities table, just index `City.
Is the 'City' field from the table 'Contacts' a VARCHAR?
If that's the case, I see multiple things here.
First of all, since you have already have the 'Id' for the corresponding city in your 'cities' tables, I don't see why not to use the same 'Id' from the 'cities' table for the 'Contacts' table.
You can add the 'IdCity' field to the 'Contacts' table so you don't have to modify your existing records.
You'll have to insert the 'IdCity' manually though for each of your records, or you can create a Query using 'cities' table and then compare the 'idCity' but insert the 'city' (city name) in your 'Contacts' table.
Returning to your query:
Then, use an INT JOIN instead of a VARCHAR JOIN. Since you have many records, this can show up an important significance in performance.
It looks like you need to add two indexes, one on cities.city and one on (contacts.manager_Id, contacts.city). That should speed things up significantly.

Have to enter mySQL criteria twice?

Say I have two tables:
Table: customers
Fields: customer_id, first_name, last_name
Table: customer_cars
Fields: car_id, customer_id, car_brand, car_active
Say I am trying to write a query that shows all customers with a first name of "Karl," and the brands of the ** active ** cars they have. Not all customers will have an active car. Some cars are active, some are inactive.
Please keep in mind that this is a representative example that I just made up, for sake of clarity and simplicity. Please don't reply with questions about why we would do it this way, that I could use table aliases, how it's possible to have an inactive car, or that my field names could be better written. It's a fake example that is intended be very simple in order to illustrate the point. It has a structure and issue that I encounter all the time.
It seems like this would be best done with a LEFT JOIN and subquery.
SELECT
customer_id,
first_name,
last_name,
car_brand
FROM
customers
LEFT JOIN
(SELECT
customer_id,
car_brand
FROM
customer_cars
INNER JOIN customers ON customer_cars.customer_id = customers.customer_id
WHERE
first_name = 'Karl' AND
customer_cars.car_active = '1') car_query ON customers.customer_id = car_query.customer_id
WHERE
first_name = 'Karl'
The results might look like this:
first_name last_name car_brand
Karl Johnson Dodge
Karl Johnson Jeep
Karl Smith NULL
Karl Davis Chrysler
Notice the duplication of 'Karl' in both WHERE clauses, and the INNER JOIN in the subquery that is the same table in the outer query. My understanding of mySQL is that this duplication is necessary because it processes the subquery first before processing the outer query. Therefore, the subquery must be properly limited so it doesn't scan all records, then it tries to match on the resulting records.
I am aware that removing the car_active = '1' condition would change things, but this is a requirement.
I am wondering if a query like this can be done in a different way that only causes the criteria and joins to be entered once. Is there a recommended way to prioritize the outer query first, then match to the inner one?
I am aware that two different queries could be written (find all records with Karl, then do another that finds matching cars). However, this would cause multiple connections to the database (one for every record returned) and would be very taxing and inefficient.
I am also aware of correlating subqueries, but from my understanding and experience, this is for returning one field per customer (e.g., an aggregate field such as how much money Karl spent) within the fieldset. I am looking for a similar approach as this, but where one customer could be matched to multiple other records like in the sample output above.
In your response, if you have a recommended query structure that solves this problem, it would be really helpful if you could write a clear example instead of just describing it. I really appreciate your time!
First, is a simple and straight query not enough?
Say I am trying to write a query that shows all customers with a first
name of "Karl," and the brands of the ** active ** cars they have. Not
all customers will have an active car. Some cars are active, some are
inactive.
Following this requirement, I can just do something like:
SELECT C.first_name
, C.last_name
, CC.car_brand
FROM customers C
LEFT JOIN cutomer_cars CC ON CC.customer_id = C.customer_id
AND car_active = 1
WHERE C.first_name = 'Karl'
Take a look at the SQL Fiddle sample.

SQL Inner Join With Multiple Columns

I've got 2 tables - dishes and ingredients:
in Dishes, I've got a list of pizza dishes, ordered as such:
In Ingredients, I've got a list of all the different ingredients for all the dishes, ordered as such:
I want to be able to list all the names of all the ingredients of each dish alongside each dish's name.
I've written this query that does not replace the ingredient ids with names as it should, instead opting to return an empty set - please explain what it that I'm doing wrong:
SELECT dishes.name, ingredients.name, ingredients.id
FROM dishes
INNER JOIN ingredients
ON dishes.ingredient_1=ingredients.id,dishes.ingredient_2=ingredients.id,dishes.ingredient_3=ingredients.id,dishes.ingredient_4=ingredients.id,dishes.ingredient_5=ingredients.id,dishes.ingredient_6=ingredients.id, dishes.ingredient_7=ingredients.id,dishes.ingredient_8=ingredients.id;
It would be great if you could refer to:
The logic of the DB structuring - am I doing it correctly?
The logic behind the SQL query - if the DB is built in the right fashion, then why upon executing the query I get the empty set?
If you've encountered such a problem before - one that requires a single-to-many relationship - how did you solved it in a way different than this, using PHP & MySQL?
Disregard The Text In Hebrew - Treat It As Your Own Language.
It seems to me that a better Database Structure would have a Dishes_Ingredients_Rel table, rather than having a bunch of columns for Ingredients.
DISHES_INGREDIENTS_REL
DishesID
IngredientID
Then, you could just do a much simpler JOIN.
SELECT Ingredients.Name
FROM Dishes_Ingredients_Rel
INNER JOIN Ingredients
ON Dishes_Ingredients.IngredientID = Ingredients.IngredientID
WHERE Dishes_Ingredients_Rel.DishesID = #DishesID
1. The logic of the DB structuring - am I doing it correctly?
This is denormalized data. To normalize it, you would restructure your database into three tables:
Pizza
PizzaIngredients
Ingredients
Pizza would have ID, name, and type where ID is the primary key.
PizzaIngredients would have PizzaId and IngredientId (this is a many-many table where the primary key is a composite key of PizzaId and IngredientID)
Ingredients has ID and name where ID is the primary key.
2. List all the names of all the ingredients of each dish alongside each dish's name. Something like this in MySQL (untested):
SELECT p.ID, p.name, GROUP_CONCAT(i.name) AS ingredients
FROM pizza p
INNER JOIN pizzaingredients pi ON p.ID = pi.PizzaID
INNER JOIN ingredients i ON pi.IngredientID = i.ID
GROUP BY p.id
3. If you've encountered such a problem before - one that requires a single-to-many relationship - how did you solved it in a way different than this, using PHP & MySQL?
Using a many-many relationship, since that what your example truly is. You have many pizzas which can have many ingredients. And many ingredients belong to many different pizzas.
The reason you are getting an empty result is because you are setting a join condition that never gets satisfied. During the INNER join execution the database engine compares each record of the first table with each record of the second one trying to find a match where the id of the ingredient table record being evaluated is equal to ingredient1 AND ingredient2 AND so on. It would return some result if you create a record in the first table with the same ingredient in all 8 columns (testing purposes only).
Regarding the database structure, you choose a denormalized one creating 8 columns for each ingredient. There are a lot of considerations possible on this data structure (performance, maintainability, or just think if you are asked to insert a dish with 9 ingredients for example) and I would personally go for a normalized data structure instead.
But if you want to keep this, you should write something like:
SELECT dishes.name, ingredients1.name, ingredients1.id, ingredients2.name, ingredients2.id, ...
FROM dishes
LEFT JOIN ingredients AS ingredients1 ON dishes.ingredient_1=ingredients1.id
LEFT JOIN ingredients AS ingredients2 ON dishes.ingredient_2=ingredients2.id
LEFT JOIN ingredients AS ingredients3 ON dishes.ingredient_3=ingredients3.id
...
The LEFT join is required to get a result for unmatched ingredients (0 value when no ingredient is set reading your example)

SQL Query to populate table based on PK of Main Table being joined

Here is my Database structure (basic relations):
I'm attempting to formulate a one-line query that will populate the clients_ID, Job_id, tech_id, & Part_id and return back all the work orders present. Nothing more nothing less.
Thus far I've struggled to generate this Query:
SELECT cli.client_name, tech.tech_name, job.Job_Name, w.wo_id, w.time_started, w.part_id, w.job_id, w.tech_id, w.clients_id, part.Part_name
FROM work_orders as w, technicians as tech, clients as cli, job_types as job, parts_list as part
LEFT JOIN technicians as techy ON tech_id = techy.tech_name
LEFT JOIN parts_list party ON part.part_id = party.Part_Name
LEFT JOIN job_types joby ON job_id = joby.Job_Name
LEFT JOIN clients cliy ON clients_id = cliy.client_name
Apparently, once all the joining happens it does not even populate the correct foreign key values according to their reference.
[some values came out as the actual foreign key id, not even
corresponding value.]
It just goes on about 20-30 times depending on largest row of a table that I have (one of the above).
I only have two work orders created, So ideally it should return just TWO Records, and columns, and fields with correct information. What could I be doing wrong? Haven't been with MySQL too long but am learning as much as I can.
Your join conditions are wrong. Join on tech_id = tech_id, not tech_id = tech_name. Looks like you do this for all your joins, so they all need to be fixed.
I really don't follow the text of your question, so I am basing my answer solely on your query.
Edit
Replying to your comment here. You said you want to "load up" the tech name column. I assume you mean you want tech name to be part of your result set.
The SELECT part of the query is what determines the columns that are in the result set. As long as the table where the column lives is referenced in the FROM/JOIN clauses, you can SELECT any column from that table.
Think of a JOIN statement as a way to "look up" a value in one table based on a value in another table. This is a very simplified definition, but it's a good way to start thinking about it. You want tech name in your result set, so you look it up in the Technicians table, which is where it lives. However, you want to look it up by a value that you have in the Work Orders table. The key (which is actually called a foreign key) that you have in the Work Orders table that relates it to the Technicians table is the tech_id. You use the tech_id to look up the related row in the Technicians table, and by doing so can include any column in that table in your result set.