Relational Algebra - how does natural join work? - relational-database

A natural join is an inner join that only works if table1 has some intersecting attributes with table2.
Yet, when I take tables that have no column names in common, it acts as a Cartesian product.
In addition, when I take different tables that have nothing in common, it displays no results.
Why?

Well, you have learned the first important lesson, which is to avoid natural join. It is just lousy syntax, because it does not even take properly declared foreign key relationships into account and the join conditions are hidden -- which makes queries hard to maintain and debug.
A natural join is an inner join equijoin with the join conditions on columns with the same names. Natural joins do not even take types into account, so the query can have type conversion errors if your data is really messed.
If the corresponding inner join on the common column names have no matches, then it returns the empty set. If there are no common column names, then it is the same as a cross join.
The way to think about it is that a natural join (inner natural join) generates the Cartesian product of two tables. When the tables have duplicated column names, then the final result set contains only those Cartesian-product rows where the common column names have the same value.

Related

MySQL Left Join and Right Join Optimization

I've been looking up some documentation about this topic here: https://dev.mysql.com/doc/refman/5.7/en/left-join-optimization.html
But I don't understand the following example:
The join optimizer calculates the order in which to join tables. The table read order forced by LEFT JOIN or STRAIGHT_JOIN helps the join optimizer do its work much more quickly, because there are fewer table permutations to check. This means that if you execute a query of the following type, MySQL does a full scan on b because the LEFT JOIN forces it to be read before d:
SELECT *
FROM a JOIN b LEFT JOIN c ON (c.key=a.key)
LEFT JOIN d ON (d.key=a.key)
WHERE b.key=d.key;
The fix in this case is reverse the order in which a and b are listed in the FROM clause:
SELECT *
FROM b JOIN a LEFT JOIN c ON (c.key=a.key)
LEFT JOIN d ON (d.key=a.key)
WHERE b.key=d.key;
Why does the order make an optimization? Do JOIN and LEFT_JOIN execute in some order?
I suspect the first quote is not quite correct. I have seen LEFT JOIN turned into JOIN and then the tables touched in the 'wrong' order.
Anyway, don't worry about the work the optimizer needs to do. In thousands of slow JOINs, I have identified only one case where the cost of picking the order was important. And it was a case of multiple joins to a single table; yet another drawback of EAV schema. Anyway, there is a simple setting to avoid that problem.
LEFT/RIGHT/plain JOINs are semantically done left-to-right (regardless of the order the optimizer chooses to touch the tables).
If you are concerned about the ordering, you can add parentheses. For example:
FROM (a JOIN b ON ...) JOIN (c JOIN d ON ...) ON ...
If you are using "commajoin" (FROM a,b...), don't. However, its precedence changed long ago. The workaround was to add parens so that the same SQL would work in versions before and after the change.
Don't use LEFT unless you need it to get NULLs for missing 'right' rows. It just confuses readers into thinking that you expect NULLs.
This example is wrong in many ways, and it is not clear to me what it is trying to convey. Apologies for that. I will file a bug with the documentation team.
Some clarifications:
For the given query, the last LEFT JOIN will be converted to an inner join. This is because the WHERE clause, WHERE b.key=d.key, implies that d.key can not be NULL. Hence, any extra rows produced by LEFT JOIN compared to INNER JOIN would be filtered out by the WHERE clause. (The principles of this transformation is described in the paragraph following the given example.)
The ON clause of the first LEFT JOIN, ON (c.key=a.key), makes table c dependent on table a, but not table b. Hence, the only the requirement wrt join order is that table a is processed before table c. The order in which tables a and b are listed in the query, will not change that.
Tables b and d may be processed in any order, both wrt each other and wrt the other tables of the query.
This paragraph seems to recommend LEFT JOIN as a mechanism to reduce number of "table permutations to check". This is not meaningful since changing from INNER JOIN to LEFT JOIN may change the semantics of the query. For this purpose STRAIGHT_JOIN should be used instead.
For most join queries, execution time by far exceeds optimization time. Reducing the number of "table permutations to check" may cause potentially more efficient permutations to not be explored. Hence, LEFT JOIN should not be used unless it is required to get the wanted semantics.

MySQL SELECT from two tables with COUNT

i have two tables as below:
Table 1 "customer" with fields "Cust_id", "first_name", "last_name" (10 customers)
Table 2 "cust_order" with fields "order_id", "cust_id", (26 orders)
I need to display "Cust_id" "first_name" "last_name" "order_id"
to where i need count of order_id group by cust_id like list total number of orders placed by each customer.
I am running below query, however, it is counting all the 26 orders and applying that 26 orders to each of the customer.
SELECT COUNT(order_id), cus.cust_id, cus.first_name, cus.last_name
FROM cust_order, customer cus
GROUP BY cust_id;
Could you please suggest/advice what is wrong in the query?
You issue here is that you have told the database how these two tables are 'connected', or what they should be connected by:
Have a look at this image:
~IMAGE SOURCE
This effectively allows you to 'join' two tables together, and use a query between them.
so you might want to use something like:
SELECT COUNT(B.order_id), A.cust_id, A.first_name, A.last_name
FROM customer A
LEFT JOIN cust_order B //this is using a left join, but an inner may be appropriate also
ON (A.cust_id= B.Cust_id) //what links them together
GROUP BY A.cust_id; // the group by clause
As per your comment requesting some further info:
Left Join (right joins are almost identical, only the other way around):
The SQL LEFT JOIN returns all rows from the left table, even if there are no matches in the right table. This means that if the ON clause matches 0 (zero) records in right table, the join will still return a row in the result, but with NULL in each column from right table. ~Tutorials Point.
This means that a left join returns all the values from the left table, plus matched values from the right table or NULL in case of no matching join predicate.
LEFT joins will be used in the cases where you wish to retrieve all the data from the table in the left hand side, and only data from the right that match.
Execution Time
While the accepted answer in this case may work well in small datasets, it may however become 'heavy' in larger databases. This is because it was not actually designed for this type of operation.
This was the purpose of Joins to be introduced.
Much work in database-systems has aimed at efficient implementation of joins, because relational systems commonly call for joins, yet face difficulties in optimising their efficient execution. The problem arises because inner joins operate both commutatively and associatively. ~Wikipedia
In practice, this means that the user merely supplies the list of tables for joining and the join conditions to use, and the database system has the task of determining the most efficient way to perform the operation. A query optimizer determines how to execute a query containing joins. So, by allowing the dbms to choose the way your data is queried, you can save a lot of time.
Other Joins/Summary
AN INNER JOIN will return data from both tables where the keys in each table match
A LEFT JOIN or RIGHT JOIN will return all the rows from one table and matching data from the other table.
Use a join when you want to query multiple tables.
Joins are much faster than other ways of querying >=2 tables (speed can be seen much better on larger datasets).
You could try this one:
SELECT COUNT(cus_order.order_id), cus.cust_id, cus.first_name, cus.last_name
FROM cust_order cus_order, customer cus
WHERE cus_order.cust_id = cus.cust_id
GROUP BY cust_id;
Maybe an left join will help you
SELECT COUNT(order_id), cus.cust_id, cus.first_name, cus.last_name ]
FROM customer cus
LEFT JOIN cust_order co
ON (co.cust_id= cus.Cust_id )
GROUP BY cus.cust_id;

Explain which table to choose "FROM" in a JOIN statement

I'm new to SQL and am having trouble understanding why there's a FROM keyword in a JOIN statement if I use dot notation to select the tables.columns that I want. Does it matter which table I choose out of the two? I didn't see any explanation for this in w3schools definition on which table is the FROM table. In the example below, how do I know which table to choose for the FROM? Since I essentially already selected which table.column to select, can it be either?
For example:
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
INNER JOIN Orders
ON Customers.CustomerID=Orders.CustomerID
ORDER BY Customers.CustomerName;
The order doesn't matter in an INNER JOIN.
However, it does matter in LEFT JOIN and RIGHT JOIN. In a LEFT JOIN, the table in the FROM clause is the primary table; the result will contain every row selected from this table, while rows named in the LEFT JOIN table can be missing (these columns will be NULL in the result). RIGHT JOIN is similar but the reverse: rows can be missing in the table named in FROM.
For instance, if you change your query to use LEFT JOIN, you'll see customers with no orders. But if you swapped the order of the tables and used a LEFT JOIN, you wouldn't see these customers. You would see orders with no customer (although such rows probably shouldn't exist).
The from statement refers to the join not the table. The join of table will create a set from which you will be selecting columns.
For an inner join it does not matter which table is in the from clause and which is in the join clause.
For outer joins it of course does matter, as the table in the outer join is allowed to have "missing" records.
It does not matter for inner joins: the optimizer will figure out the proper sequence of reading the tables, regardless of your choice for the ordering.
For directional outer joins, it does matter, because these are not symmetric. You choose the table in which you want to keep all rows for the first FROM table in a left outer join; for the right outer join it is the other way around.
For full outer joins it does not matter again, because the tables in full outer joins are used symmetrically to each other.
In situations when ordering does not matter you pick the order to be "natural" to the reader of your SQL statement, whatever that means for your model. SQL queries very quickly become rather hard to read, so the proper ordering of your tables is important for human readers of your queries.
Well in your current example the from operator can be applied on both tables.
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers,Orders
WHERE Customers.CustomerID=Orders.CustomerID
ORDER BY Customers.CustomerName;
->Will work like your code
The comma will join the two tables.
From just means which table you are retrieving data from.
In your example, you joined the two tables using different syntax.
it could also have been :
SELECT Customers.CustomerName, Orders.OrderID
FROM Orders
INNER JOIN Customers
ON Customers.CustomerID=Orders.CustomerID
ORDER BY Customers.CustomerName;
all the code written will generate same results

Natural join if no common attributes

What will natural join return in relational algebra if tables don't have attributes with same names? Will it be null or the same as cross join (cross product) (Cartesian product)?
If there are no attributes in common between two relations and you perform a natural join, it will return the cartesian product of the two relations.
The cartesian product of the two tables will be returned. This is because, when we perform any JOIN operation on two tables, a cartesian product of those tables is formed. Then, based on any select condition in the WHERE clause, the resultant rows are returned. But here, as there are no common columns, the process stops after finding the cartesian product.
It will return the cartesian product of the tables. If there are any common attributes, then the natural join removes the duplicate attributes.

In my MySQL what is the difference between using the word JOIN or using: table1.id=table2.id

In my MySQL what is the difference between using the word JOIN or using: table1.id=table2.id
Table 1
id name
1 John Lennon
Table 2
id position
1 Singer
//What is the difference then calling JOIN?
Select name, position
From Table1, Table2
Where Table1.id=Table2.id;
There is none as long as there are only two tables. When there are more, you might get different JOIN order than expected, if you use comma to separate tables.
From manual:
INNER JOIN and , (comma) are semantically equivalent in the absence of
a join condition: both produce a Cartesian product between the
specified tables (that is, each and every row in the first table is
joined to each and every row in the second table).
However, the precedence of the comma operator is less than of INNER
JOIN, CROSS JOIN, LEFT JOIN, and so on. If you mix comma joins with
the other join types when there is a join condition, an error of the
form Unknown column 'col_name' in 'on clause' may occur. Information
about dealing with this problem is given later in this section.
If this case, there's no difference.
But the joins can also be: INNER, OUTER (left outer, right outer, full outer joins), LEFT, RIGHT.
Using a JOIN is known as an "explicit join", or an ANSI join... separating your tables with a comma and then linking them in the WHERE clause is known as an "implicit join".
The ANSI join is widely considered to be the preferred syntax. It is easier to follow, less ambiguous, and more consistent when you need to start adding outer joins.
Additionally, if you use the implicit join syntax, it is easy to end up with an accidental cross join if you forget to link the tables in your WHERE clause.