How can I improve this query to avoid using nested views? [closed] - mysql

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
Find patients who visited all orthopedists (specialty) associated with their insurance companies.
Database: Click here to view the sample data script in SQL Fiddle.
CREATE VIEW Orthos AS
SELECT d.cid,d.did
FROM Doctors d
WHERE d.speciality='Orthopedist';
CREATE VIEW OrthosPerInc AS
SELECT o.cid, COUNT(o.did) as countd4i
FROM Orthos o
GROUP BY o.cid;
CREATE VIEW OrthoVisitsPerPat AS
SELECT v.pid,COUNT(o.did) as countv4d
FROM Orthos o,Visits v,Doctors d
WHERE o.did=v.did and d.did=o.did
GROUP BY v.pid,d.cid;
SELECT p.pname,p.pid,p.cid
FROM OrthoVisitsPerPat v, OrthosPerInc i,Patient p
WHERE i.countd4i = v.countv4d and p.pid=v.pid and p.cid=i.cid;
DROP VIEW IF EXISTS Orthos,OrthosPerInc,OrthoVisitsPerPat;
How can i write it on one query?
Attempt:
So far, here is my attempt at getting this resolved.
SELECT p.pid,p.pname,p.cid,COUNT(v.did)
FROM Visits v
JOIN Doctors d ON v.did=d.did
JOIN Patient p ON p.pid=v.pid
WHERE d.cid=p.cid and d.speciality="Orthopedist"
GROUP BY p.pid,p.cid;
INTERSECT
SELECT p.pid,d.cid,COUNT(d.did)
FROM Doctors d
JOIN Patient p ON p.cid=d.cid
WHERE d.speciality='Orthopedist'
GROUP BY d.cid;

Familiarize with the data that you have:
The first key thing is to understand what data you have. Here in this case, you have four tables
InsuranceCompanies
Patient
Doctors
Visits
Your goal:
Find the list of all the patients who visited all orthopedists (specialty) associated with their Insurance Companies.
Let's take a step back and analyze it in smaller pieces:
Generally, the requirements might be a bit overwhelming when you look at them on the whole. Let's split the requirements into smaller components to understand what you need to do.
Part a: You need to find the list of doctors, whose speciality is 'Orthopedist'
Part b: Find the list of patients who visited doctors identified in #1.
Part c: Filter the result #2 to find the list of patients and doctors who share the same insurance company.
Part d: Find out that the patients who visited each one of those Orthopedists who belong to the same insurance company as the patient do.
How to approach:
You need to identify your main goal, here in this case to identify the list of patients. So, query the Patient table first.
You have the patients, actually all of them but we need to find which of these patients visited the doctors. Let's not worry about whether the doctor is an Orthopedist or not. We just need the list of patients and the doctors they have visited. There is no mapping between Patient and Doctors table. To find out this information,
Join the Patient table with Visits table on the correct key field.
Then join the output with the Doctors table on the correct key field.
If you have done the join correctly, you should now have the list of all the patients and the doctors that they have visited. If you used LEFT OUTER JOIN, you will find even the patients who had never visited a doctor. If you used RIGHT OUTER JOIN, you will find only the patients who visited a doctor.
Now, you have all the patients and the doctors whom they have visited. However, the requirement is to find only the doctors who are Orthopedists. So, apply the condition to filter the result to give only the desired result.
You have now achieved the requirements as split into smaller components in part a and part b. You still need to filter it by the insurance companies. Here is the tricky part, the requirement doesn't say that you need to display the insurance company, so we don't have to use the table InsuranceCompanies. Your next question will 'How am I going to filter the results?'. Valid point. Find out if any of the three tables Patient, Doctor and Visits contain the insurance company information. Patient and Doctors have a common field. Join that common field to filter the result.
Find the count of unique Orthopedists that each patient has visited.
Here is the part that can be done in many ways, one of the way of doing this would be to add a sub query that would be your fourth column in the output. This sub query would query the table Doctors and filter by speciality = 'Orthopedist'. In addition to that filter, you also have to filter by matching the insurance company on the inner table with the insurance company id on the Patients table that is on the main query. This subquery will return the count of all the Orthopedists for insurance company id that matches the patient's data.
You should now have the fields patient id, patient name, patients visits count and the total number of Orthopedists in same insurance company from the sub query. You can then add an outer join that will filter the results from this derived table on the fields where patients visits count matches with total number of Orthopedists in same insurance company. I am not saying this is the best approach. This is one approach that I can think of.
If you follow the above logic, you should have this.
List of patients who have visited all the doctors
Filtered by only doctors, whose are Orthopedists
Filtered by patients and doctors sharing the same insurance company information.
Again, the whole output is then filtered by the two count fields found inside the derived table output.
The ball is in your court:
Try it step by step and once you find the answer. Post it here as a separate answer. I will upvote it to compensate for all the downvotes that you got on this question.
I am confident that you can do this easily.
If you stumble...
Don't hesitate to post your questions as comments to this answer, Others and I will be glad to assist you.
Disclaimer
I have provided one of the many ways how this logic can be implemented. I am sure that there are many ways to implement this in a far better manner.
Outcome:
Please refer #Ofek Ron's answer for the correct query that produces the desired output. I didn't write any part of the query. It was all OP's effort.

#Siva's Explanation :
this is the resulting code of parts 1-5:
SELECT *
FROM Patient p
JOIN Visits v ON v.pid=p.pid
JOIN Doctors d ON d.did=v.did and d.cid=p.cid
WHERE d.speciality="Orthopedist"
applying part 6:
SELECT p.pid,COUNT(d.did)
FROM Patient p
JOIN Visits v ON v.pid=p.pid
JOIN Doctors d ON d.did=v.did and d.cid=p.cid
WHERE d.speciality="Orthopedist"
GROUP BY p.pid
applying the rest: Click here to view the demo in SQL Fiddle.
SELECT p.pid,p.cid,COUNT(DISTINCT d.did) as c
FROM Patient p
JOIN Visits v ON v.pid=p.pid
JOIN Doctors d ON d.did=v.did and d.cid=p.cid
WHERE d.speciality="Orthopedist"
GROUP BY p.pid
HAVING (p.cid,c) IN
(SELECT d.cid,COUNT(DISTINCT d.did)
FROM Doctors d
WHERE d.speciality="Orthopedist"
GROUP BY d.cid);

Maybe something like this:
SELECT
p.pname,
p.pid,
p.cid
FROM
Patient AS p
JOIN
(
SELECT v.pid,COUNT(o.did) as countv4d
FROM Doctors d
JOIN Visits v ON o.did=v.did
WHERE d.speciality='Orthopedist'
GROUP BY v.pid,d.cid;
) AS v
ON p.pid=v.pid
JOIN
(
SELECT o.cid, COUNT(o.did) as countd4i
FROM Doctors d
WHERE d.speciality='Orthopedist'
GROUP BY o.cid;
) AS i
ON p.cid=i.cid
WHERE
i.countd4i = v.countv4d

Related

SQL giving me lines that doesn't exist?

While using this:
SELECT borrowbook.studentusername, borrowbook.schoolbookid,borrowbook.date,borrowbook.deadline, book.title, student.email, student.fname, student.lname
FROM borrowbook, book, student
I get many lines, but in my database I just have four lines in the borrowbook table, and while using this, I get some "lines" that doesn't exist. (Note: this works through php on a website, I cannot seem to make this work in mysql so I think I have done something)
Like that a person that had borrowed one book (line 1 in my list of borrowed books) suddenly has borrowed ten different books that I have not registered anyone to borrow. With date as to when it was loaned, and deadline just taken from one of the four lines I have registered.
Even the same person that is registered to borrow one book, suddenly shows up as if they borrowed it four times with different dates. Dates and deadline are taken from "borrowbook" while different names of students are taken from another table, since they have never been used in the "borrowbook" line.
I have tried this now in different ways and with different content and different tables, but still get many "made up" lines of loans that is not registered.
I know very little, but I am grateful for all help I can get. Articles help as well.
Without joins, you duplicate records. For a better practice, you should use explicit joins instead of implicit ones. If you have student.username and book.id fields, you can do something like this:
SELECT borrowbook.studentusername, borrowbook.schoolbookid,borrowbook.date,borrowbook.deadline,
book.title,
student.email, student.fname, student.lname
FROM borrowbook
INNER JOIN student ON borrowbook.studentusername=student.username
INNER JOIN schoolbook ON borrowbook.schoolbookid=schoolbook.id
INNER JOIN book ON schoolbook.isbn=book.isbn
;
You haven't specified any JOIN conditions in your query, and because of that tables will be CROSS JOIN-ed, i.e., every record from the borrowbook table is paired with every record from the book table which is then paired with every record from the student table. So if you have X, Y and Z number of records in each table respectively, you will get X * Y * Z records as a result.
You probably want to add join conditions such as (I'm just guessing column names):
SELECT borrowbook.studentusername, borrowbook.schoolbookid,borrowbook.date,borrowbook.deadline, book.title, student.email, student.fname, student.lname
FROM borrowbook, book, student
WHERE borrowbook.book_id = book.id and borrowbook.student_id = student.id

MySQL - When shouldn't I Join tables? Combinatorial Explosion of values

I am working on a database called classicmodels, which I found at: https://www.mysqltutorial.org/mysql-sample-database.aspx/
I realized that when I executed an Inner Join between 'payments' and 'orders' tables, a 'cartesian explosion' occurred. I understand that these two tables are not meant to be joined. However, I would like to know if it is possible to identify this just by looking at the relational schema or if I should check the tables one by one.
For instance, the customer number '141' appears 26 times in the 'orders table', which I found by using the following code:
SELECT
customerNumber,
COUNT(customerNumber)
FROM
orders
WHERE customerNumber=141
GROUP BY customerNumber;
And the same customer number (141) appears 13 times in the payments table:
SELECT
customerNumber,
COUNT(customerNumber)
FROM
payments
WHERE customerNumber=141
GROUP BY customerNumber;
Finally, I executed an Inner Join between 'payments' and 'orders' tables, and selected only the rows with customer number '141'. MySQL returned 338 rows, which is the result of 26*13. So, my query is multiplying the number of times this 'customer n°' appears in 'orders' table by the number of times it appears in 'payments'.
SELECT
o.customernumber,
py.amount
FROM
customers c
JOIN
orders o ON c.customerNumber=o.customerNumber
JOIN
payments py ON c.customerNumber=py.customerNumber
WHERE o.customernumber=141;
My questions is the following:
1 ) Is there a way to look at the relational schema and identify if a Join can be executed (without generating a combinatorial explosion)? Or should I check table by table to understand how the relationship between them is?
Important Note: I realized that there are two asterisks in the payments table's representation in the relational schema below. Maybe this means that this table has a composite primary key (customerNumber+checkNumber). The problem is that 'checkNumber' does not appear in any other table.
This is the database's relational schema provided by the 'MySQL Tutorial' website:
Thank you for your attention!
This is called "combinatorial explosion" and it happens when rows in one table each join to multiple rows in other tables.
(It's not "overestimation" or any sort of estimation. It's counting data items multiple times when it should only count them once.)
It's a notorious pitfall of summarizing data in one-to-many relationships. In your example each customer may have no orders, one order, or more than one. Independently, they may have no payments, one, or many.
The trick is this: Use subqueries so your toplevel query with GROUP BY avoids joining one-to-many relationships serially. In the query you showed us, that's happening.
You can this subquery to get a resultset with just one row per customer. (try it.)
SELECT customernumber,
SUM(amount) amount
FROM payments
GROUP BY customernumber
Likewise you can get the value of all orders for each customer with this
SELECT c.customernumber,
SUM(od.qytOrdered * od.priceEach) amount
FROM orders o
JOIN orderdetails od ON o.orderNumber = od.orderNumber
GROUP BY c.customernumber
This JOIN won't explode in your face because customer can have multiple orders, and each order can have multiple details. So it's a strict hierarchical rollup.
Now, we can use these subqueries in the main query.
SELECT c.customernumber, p.payments, o.orders
FROM customers c
LEFT JOIN (
SELECT c.customernumber,
SUM(od.qytOrdered * od.priceEach) orders
FROM orders o
JOIN orderdetails od ON o.orderNumber = od.orderNumber
GROUP BY c.customernumber
) o ON c.customernumber = o.customernumber
LEFT JOIN (
SELECT customernumber,
SUM() payment
FROM payments
GROUP BY customernumber
) p on c.customernumber = p.customernumber
Takehome tricks:
A subquery IS a table (a virtual table) that can be used whereever you might mention a table or a view.
The GROUP BY stuff in this query happens separately in two subqueries, so no combinatorial explosions.
All three participants in the toplevel JOIN have either one or zero rows per customernumber.
The LEFT JOINs are there so we can still see customers with (importantly for a business) no orders or no payments. With the ordinary inner JOIN, rows have to match both sides of the ON conditions or they're omitted from the resultset.
Pro tip Format your SQL queries fanatically carefully: They are really verbose. Adm. Grace Hopper would be proud. That means they get quite long and nested, putting the Structured in Structured Query Language. If you, or anybody, is going to reason about them in future, we must be able to grasp the structure easily.
Pro tip 2 The data engineer who designed this database did a really good job thinking it through and documenting it. Aspire to this level of quality. (Rarely reached in the real world.)
In this particular case, your behavior should depend on the accounting style being supported by the database, and this does not appear to be "open item" style accounting ie when an order is raised for 1000 there does not need to be a payment against it for 1000.. This is perhaps unusual in most consumer experience because you will be quite familiar with open item style ordering from Amazon - you buy a 500 dollar tv and a 500 dollar games console, the order is a thousand dollars and you pay for it, the payment going against the order. However, you're also familiar with "balance forward" accounting if you paid for that order using your credit card because you make similar purchases every day for a month and hen you get a statement from your bank saying you owe 31000 and you pay a lump of money, doesn't even have to be 31k. You aren't expected to make 31 payments of 1000 to your bank at the end of the month. Your bank allocate it to the oldest items on the account (if they're nice, or the newest items if they're not) and may eventually charge you interest on unpaid transactions
1 ) Is there a way to look at the relational schema and identify if a Join can be executed
Yes, you can tell looking at the schema- customer has many orders, customer makes many payments, but there is no relation between the order and payment tables at all so we can see there is no attempt to directly attach a payment to an order. You can see that customer is a parent table of payment and order, and therefore enjoys a relationship with each of them but they do not relate to each other. If you had Person, Car and Address tables, a person has many addresses during their life, and many cars but it doesn't mean there is a relationship between cars and addresses
In such a case it simply doesn't make sense to join payments to customers to orders because they do not relate that way. If you want to make such a join and not suffer a Cartesian explosion then you absolutely have to sum one side or the other (or both) to ensure that your joins are 1:1 and 1:M (or 1:1 and 1:1). You cannot arrange a join that is a pair of 1:M.
Going back to the car/person/address example to make any meaningful joins, you have to build more information into the question and arrange the join to create the answer. Perhaps the question is "what cars did they own while they lived at" - this flattens the Person:Address relationship to 1:1 but leaves Person:Car as 1:M so they might have owned many cars during their time in that house. "What was the newest car they owned while living at..." might be 1:1 on both sides if there is a clear winner for "newest" (though if they bought two cars manufactured at identical times...)
Which side you sum in your orders case will depend on what you want to know, but in this case I'd say you usually want to know "which orders haven't been paid for" and that's summing all payments and rolling summing all orders then looking at what point the rolling sum exceeds the sum of payments.. those are the unpaid orders
Take a look again at your database graph (the one that was present in the first iteration of your question). See the lines between tables have 3 angled legs on one end - that's the many end. You can start at any table in the graph and join to other tables by walking along the relationship. If you're going from the many end to the one end, and assuming you've picked out a single row in the start table (a single order) you can always walk to any other table in the many->one direction and not increase your row count. If you walk the other way you potentially increase your row count. If you split and walk two ways that both increase row count you get a Cartesian explosion. Of course, also you don't have to only join on relation lines, but that's out of scope for the question
ps: this is easier to see on the db diagram than the ERD in the question because the database purely concerns itself with the columns that are foreign keyed. The ERD is saying a customer has zero or one payments with a particular check number but the database will only be concerned with "the customer ID appears once in the customer table and multiple times in the payment table" because only part of the compound primary key of payment is keyed to the customer table. In other words, the ERD is concerned with business logic relations too, but the db diagram is purely how tables relate and they aren't necessarily aligned. For this reason the db diagrams are probably easier to read when walking round for join strategies
After seeing the answers of Caius Jard and O.Jones (please, check their replies), which kindly helped me to clarify this doubt, I decided to create a table to identify which customers paid for all orders they made and which ones did not. This creates a pertinent reason to join 'orders', 'orderdetails', 'payments' and 'customers' tables, because some orders may have been cancelled or still may be 'On Hold', as we can see in their corresponding 'status' in the 'orders' table. Also, this enables us to execute this join without generating a 'combinatorial explosion'.
I did this by using the CASE statement, which registers when py.amount and amount_in_orders match, don't match or when they are NULL (customers which did not make orders or payments):
SELECT
c.customerNumber,
py.amount,
amount_in_orders,
CASE
WHEN py.amount=amount_in_orders THEN 'Match'
WHEN py.amount IS NULL AND amount_in_orders IS NULL THEN 'NULL'
ELSE 'Don''t Match'
END AS Match
FROM
customers c
LEFT JOIN(
SELECT
o.customerNumber, SUM(od.quantityOrdered*od.priceEach) AS amount_in_orders
FROM
orders o
JOIN orderdetails od ON o.orderNumber=od.orderNumber
GROUP BY o.customerNumber
) o ON c.customerNumber=o.customerNumber
LEFT JOIN(
SELECT customernumber, SUM(amount) AS amount
FROM payments
GROUP BY customerNumber
) py ON c.customerNumber=py.customerNumber
ORDER BY py.amount DESC;
The query returned 122 rows. The images below are fractions of the generated output, so you can visualize what happened:
For instance, we can see that the customers identified by the numbers '141', '124', '119' and '496' did not pay for all the orders they made. Maybe some of them where cancelled or maybe they simply did not pay for them yet.
And this image shows some of the columns (not all of them) that are NULL:

Have to enter mySQL criteria twice?

Say I have two tables:
Table: customers
Fields: customer_id, first_name, last_name
Table: customer_cars
Fields: car_id, customer_id, car_brand, car_active
Say I am trying to write a query that shows all customers with a first name of "Karl," and the brands of the ** active ** cars they have. Not all customers will have an active car. Some cars are active, some are inactive.
Please keep in mind that this is a representative example that I just made up, for sake of clarity and simplicity. Please don't reply with questions about why we would do it this way, that I could use table aliases, how it's possible to have an inactive car, or that my field names could be better written. It's a fake example that is intended be very simple in order to illustrate the point. It has a structure and issue that I encounter all the time.
It seems like this would be best done with a LEFT JOIN and subquery.
SELECT
customer_id,
first_name,
last_name,
car_brand
FROM
customers
LEFT JOIN
(SELECT
customer_id,
car_brand
FROM
customer_cars
INNER JOIN customers ON customer_cars.customer_id = customers.customer_id
WHERE
first_name = 'Karl' AND
customer_cars.car_active = '1') car_query ON customers.customer_id = car_query.customer_id
WHERE
first_name = 'Karl'
The results might look like this:
first_name last_name car_brand
Karl Johnson Dodge
Karl Johnson Jeep
Karl Smith NULL
Karl Davis Chrysler
Notice the duplication of 'Karl' in both WHERE clauses, and the INNER JOIN in the subquery that is the same table in the outer query. My understanding of mySQL is that this duplication is necessary because it processes the subquery first before processing the outer query. Therefore, the subquery must be properly limited so it doesn't scan all records, then it tries to match on the resulting records.
I am aware that removing the car_active = '1' condition would change things, but this is a requirement.
I am wondering if a query like this can be done in a different way that only causes the criteria and joins to be entered once. Is there a recommended way to prioritize the outer query first, then match to the inner one?
I am aware that two different queries could be written (find all records with Karl, then do another that finds matching cars). However, this would cause multiple connections to the database (one for every record returned) and would be very taxing and inefficient.
I am also aware of correlating subqueries, but from my understanding and experience, this is for returning one field per customer (e.g., an aggregate field such as how much money Karl spent) within the fieldset. I am looking for a similar approach as this, but where one customer could be matched to multiple other records like in the sample output above.
In your response, if you have a recommended query structure that solves this problem, it would be really helpful if you could write a clear example instead of just describing it. I really appreciate your time!
First, is a simple and straight query not enough?
Say I am trying to write a query that shows all customers with a first
name of "Karl," and the brands of the ** active ** cars they have. Not
all customers will have an active car. Some cars are active, some are
inactive.
Following this requirement, I can just do something like:
SELECT C.first_name
, C.last_name
, CC.car_brand
FROM customers C
LEFT JOIN cutomer_cars CC ON CC.customer_id = C.customer_id
AND car_active = 1
WHERE C.first_name = 'Karl'
Take a look at the SQL Fiddle sample.

Why would a SQL query need to be so complicated like this feature allows?

I am studying for SQL exam, and I came across this fact, regarding subqueries:
2. Main query and subquery can get data from different tables
When is a case when this feature would be useful? I find it difficult to imagine such a case.
Millions of situations call for finding information in different tables, it's the basis of relational data. Here's an example:
Find the emergency contact information for all students who are in a chemistry class:
SELECT Emergency_Name, Emergency_Phone
FROM tbl_StudentInfo
WHERE StudentID IN (SELECT b.StudentID
FROM tbl_ClassEnroll b
WHERE Subject = 'Chemistry')
SELECT * FROM tableA
WHERE id IN (SELECT id FROM tableB)
There is plenty of reasons why you have to get data from different tables, such as select sth from main query, which is based on subquery/subqueries from another tables. The usage is really huge.
choose customers from main query which is based on regions and their values
SELECT * FROM customers
WHERE country IN(SELECT name FROM country WHERE name LIKE '%land%')
choose products from main query which is greater or lower than average incoming salary of customers and so on...
You could do something like,
SELECT SUM(trans) as 'Transactions', branch.city as 'city'
FROM account
INNER JOIN branch
ON branch.bID = account.bID
GROUP BY branch.city
HAVING SUM(account.trans) < 0;
This would for a company to identify which branch makes the most profit and which branch is making a loss, it would help identify if the company had to make changes to their marketing approach in certain regions, in theory allowing for the company to become more dynamic and reactive to changes in the economy at any give time.

how to design: users with different roles see different records

I have a schema design question for my application, hope I can get advices from teachers. This is very alike of Role Based Access Controll, but a bit different in detail.
Spec:
For one company, there are 4 roles: Company (Boss) / Department (Manager) / Team (Leader) / Member (Sales), and there are about 1 million Customers records. Each customer record can be owned by someone, and he could be Boss or Manager or Leader or Sales. If the record's owner is some Sales, then his upper grade (say: his leader / manager / boss) can see this record as well (but others: say the same level of his workmates, cannot see, unless his upper grade manager share the customer to his workmates), but if the record's owner is boss, none except the boss himself can see it.
My Design is like this (I want to improve it to make it more simple and clear):
Table:
departments:
id (P.K. deparment id)
d_name (department name)
p_id (parent department id)
employees
id (P.K. employee id)
e_name (employee name)
employee_roles
id (P.K.)
e_id (employee id)
d_id (department id)
customers
id (P.K. customer id)
c_name (customer name)
c_phone (customer phone)
permissions
id (P.K.)
c_id (customer id)
e_id (owner employee id)
d_id (this customer belongs to which deparment)
share_to (this customer share to other's id)
P.S.: each employee can have multi roles, for example, employee A can be the manager of department_I and meanwhile he can also be one sales of deparment_II >> Team_X.
So, when an employee login to application, by querying from employee_roles table, we can get all of the department ids and sub department ids, and save them into an array.
Then I can use this array to query from permissions table and join it with customers table to get all the customers this employee should see. The SQL might look like this:
SELECT * FROM customers AS a INNER JOIN permissions AS b ON a.id =
b.c_id AND (b.d_id IN ${DEP_ARRAY} OR e_id = ${LOGIN_EMPLOYEE_ID} OR
share_to = ${LOGIN_EMPLOYEE_ID})
I don't really like the above SQL, especially the "IN" clause, since I am afraid it will slow down the query, since there are about 1 million records or even more in the customer table; and, there will be as many records as the customers table in the permissions table, the INNER JOIN might be very slow too. (So what I care about is the performance like everyone :))
To my best knowledge, this is the best design I can work out, could you teachers please help to give me some advice on this question? If you need anything more info, please let me know.
Any advice would be appreciated!
Thanks a million in advance!!
Do not use an array, use a table, ie the value of a select statement. And stop worrying about performance until you know more basics about thinking in terms of tables and queries.
The point of the relational model is that if you structure your data as tables then you can looplessly describe the output table and the DBMS figures out how to calculate it. See this. Do not think about "joining"; think about describing the result. Whatever the DBMS ends up doing is its business not yours. Only after you become knowledgeable about variations in descriptions and options for descriptions will you have basic knowledge to learn about performance.