What's the purpose of an IMPLICIT JOIN in SQL? - mysql

So, I don't really understand the purpose of using an implicit join in SQL. In my opinion, it makes a join more difficult to spot in the code, and I'm wondering this:
Is there a greater purpose for actually wanting to do this besides the simplicity of it?

Fundamentally there is no difference between the implicit join and the explicit JOIN .. ON ... Execution plans are the same.
I prefer the explicit notation as it makes it easier to read and debug.
Moreover, in the explicit notation you define the relationship between the tables in the ON clause and the search condition in the WHERE clause.

Explicit vs implicit SQL joins
When you join several tables no matter how the join condition written, anyway optimizer will choose execution plan it consider the best. As for me:
1) Implicit join syntax is more concise.
2) It easier to generate it automatically, or produce using other SQL script.
So I use it sometimes.

Others have answered the question from the perspective of what most people understand by "implicit JOIN", an INNER JOIN that arises from table lists with join predicates in the WHERE clause. However, I think it's worth mentioning also the concept of an "implicit JOIN" as some ORM query languages understand it, such as Hibernate's HQL or jOOQ or Doctrine and probably others. In those cases, the join is expessed as a path expression anywhere in the query, such as e.g.
SELECT
b.author.first_name,
b.author.last_name,
b.title,
b.language.cd AS language
FROM book b;
Where the path b.author implicitly joins the AUTHOR table to the BOOK table using the foreign key between the two tables. Your question still holds for this type of "implicit join" as well, and the answer is the same, some users may find this syntax more convenient than the explicit one. There is no other advantage to it.
Disclaimer: I work for the company behind jOOQ.

Related

Is this SQL statement making a join? [duplicate]

I develop against Oracle databases. When I need to manually write (not use an ORM like hibernate), I use a WHERE condition instead of a JOIN.
for example (this is simplistic just to illustrate the style):
Select *
from customers c, invoices i, shipment_info si
where c.customer_id = i.customer_id
and i.amount > 999.99
and i.invoice_id = si.invoice_id(+) -- added to show a replacement for a join
order by i.amount, c.name
I learned this style from an OLD oracle DBA. I have since learned that this is not standard SQL syntax. Other than being non-standard and much less database portable, are there any other repercussions to using this format?
I don't like the style because it makes it harder to determine which WHERE clauses are for simulating JOINs and which ones are for actual filters, and I don't like code that makes it unnecessarily difficult to determine the original intent of the programmer.
The biggest issue that I have run into with this format is the tendency to forget some join's WHERE clause, thereby resulting in a cartesian product. This is particularly common (for me, at least) when adding a new table to the query. For example, suppose an ADDRESSES table is thrown into the mix and your mind is a bit forgetful:
SELECT *
FROM customers c, invoices i, addresses a
WHERE c.customer_id = i.customer_id
AND i.amount > 999.99
ORDER BY i.amount, c.name
Boom! Cartesian product! :)
The old style join is flat out wrong in some cases (outer joins are the culprit). Although they are more or less equivalent when using inner joins, they can generate incorrect results with outer joins, especially if columns on the outer side can be null. This is because when using the older syntax the join conditions are not logically evaluated until the entire result set has been constructed, it is simply not possible to express a condition on a column from outer side of a join that will filter records when the column can be null because there is no matching record.
As an example:
Select all Customers, and the sum of the sales of Widgets on all their Invoices in the month Of August, where the Invoice has been processed (Invoice.ProcessDate is Not Null)
using new ANSI-92 Join syntax
Select c.name, Sum(d.Amount)
From customer c
Left Join Invoice I
On i.custId = c.custId
And i.SalesDate Between '8/1/2009'
and '8/31/2009 23:59:59'
And i.ProcessDate Is Not Null
Left Join InvoiceDetails d
On d.InvoiceId = i.InvoiceId
And d.Product = 'widget'
Group By c.Name
Try doing this with old syntax... Because when using the old style syntax, all the conditions in the where clause are evaluated/applied BEFORE the 'outer' rows are added back in, All the UnProcessed Invoice rows will get added back into the final result set... So this is not possible with old syntax - anything that attempts to filter out the invoices with null Processed Dates will eliminate customers... the only alternative is to use a correlated subquery.
Some people will say that this style is less readable, but that's a matter of habit. From a performance point of view, it doesn't matter, since the query optimizer takes care of that.
I have since learned that this is not standard SQL syntax.
That's not quite true. The "a,b where" syntax is from the ansi-89 standard, the "a join b on" syntax is ansi-92. However, the 89 syntax is deprecated, which means you should not use it for new queries.
Also, there are some situations where the older style lacks expressive power, especially with regard to outer joins or complex queries.
It can be a pain going through the where clause trying to pick out join conditions. For anything more than one join the old style is absolute evil. And once you know the new style, you may as well just keep using it.
This is a standard SQL syntax, just an older standard than JOIN. There's a reason that the syntax has evolved and you should use the newer JOIN syntax because:
It's more expressive, clearly indicating which tables are JOINed, the JOIN order, which conditions apply to which JOIN, and separating out the filtering WHERE conditions from the JOIN conditions.
It supports LEFT, RIGHT, and FULL OUTER JOINs, which the WHERE syntax does not.
I don't think you'll find the WHERE-type JOIN substantially less portable than the JOIN syntax.
As long as you don't use the ANSI natural join feature I'm OK with it.
I found this quote by – ScottCher, I totally agree:
I find the WHERE syntax easier to read than INNER JOIN - I guess its like Vegemite. Most people in the world probably find it disgusting but kids brought up eating it love it.
It really depends on habits, but I have always found Oracle's comma separated syntax more natural. The first reason is that I think using (INNER) JOIN diminishes readability. The second is about flexibility. In the end, a join is a cartesian product by definition. You do not necessarily have to restrict the results based on IDs of both tables. Although very seldom, one might well need cartesian product of two tables. Restricting them based on IDs is just a very reasonable practice, but NOT A RULE. However, if you use JOIN keyword in e.g. SQL Server, it won't let you omit the ON keyword. Suppose you want to create a combination list. You have to do like this:
SELECT *
FROM numbers
JOIN letters
ON 1=1
Apart from that, I find the (+) syntax of Oracle also very reasonable. It is a nice way to say, "Add this record to the resultset too, even if it is null." It is way better than the RIGHT/LEFT JOIN syntax, because in fact there is no left or right! When you want to join 10 tables with several different types of outer joins, it gets confusing which table is on the "left hand side" and which one on the right.
By the way, as a more general comment, I don't think SQL portability exists in the practical world any more. The standard SQL is so poor and the expressiveness of diverse DBMS specific syntax are so often demanded, I don't think 100% portable SQL code is an achievable goal. The most obvious evidence of my observation is the good old row number problemmatic. Just search any forum for "sql row number", including SO, and you will see hundreds of posts asking how it can be achieved in a specific DBMS. Similar and related to that, so is limiting the number of returned rows, for example..
This is Transact SQL syntax, and I'm not quite sure how "unportable" it is - it is the main syntax used in Sybase, for example (Sybase supports ANSI syntax as well) as well as many other databases (if not all).
The main benefits to ANSI syntax is that it allows you to write some fairly tricky chained joins that T-SQL prohibits
Speaking as someone who writes automated sql query transformers (inline view expansions, grafted joins, union factoring) and thinks of SQL as a data structure to manipulate: the non-JOIN syntax is far less pain to manipulate.
I can't speak to "harder to read" complaints; JOIN looks like an lunge toward relational algebra operators. Don't go there :-)
Actually, this syntax is more portable than a JOIN, because it will work with pretty much any database, whereas not everybody supports the JOIN syntax (Oracle Lite doesn't, for example [unless this has changed recently]).

Filtering Condition in n-Table Joins

We seem to have a need for a multi-table JOIN operation and I am referring to some notes from an RDBMS class that I took several years ago. In this class the instructor graphically depicted the structure of a generic N-table JOIN query.
The figure seems to conform to examples of multi-table JOINs that I have seen but I have a question. Does the WHERE clause, for providing filtering, necessarily have to be the last clause in the query? Intuitively it appears that we can impose filtering conditions before a following JOIN clause, in order to properly scope the data, before we input it to the next JOIN operation.
Syntactically, the where clause has to be at the end. But the query plan will take it into account and use it to filter wherever possible. Note that just because you specify the from and joins in a given order doesn't mean the query will actually execute that way; it may rearrange them to whatever order it thinks will work best (unless you specify straight_join).
That said, having the where at the end does make some queries actually harder to read.
SQL queries consist of a sequence of clauses. The diagram you have is rather misleading. Common clauses -- and the order they must appear for a valid query -- are:
SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY
Note that JOIN is not a clause. It is an operator, and an operator that specifically appears only in the FROM clause.
So, the answer to your question is that WHERE clauses immediately follow the FROM clause. The only "sort-of" exception is when a "window" clause is included and that is syntactically between the FROM and the WHERE.
Next, multiple table joins are often quite efficient and there is no reason whatsoever to discourage their use. Support for joins, in fact, is one of the key design features that databases are designed around.
And finally. What actually gets executed is not the string that you create. A query, in fact, describes the result set you want. It does not describe the processing. SQL is a descriptive language, not a procedural language.
The SQL engine has two steps to convert your query string to an executable form (typically a directed acyclic graph). One is to compile the query, and the second is to optimize the query. So, where filtering actually occurs . . . that depends on what the optimizer decides. And where it occurs has little relationship to what you think of when you think of SQL queries (DAGs don't generally have nodes called "select" or "join").

A query about MySQL table aliases

So there's a question about MySQL aliases for table names, and it has raised me to ask this question here:
Why is the use of aliases in naming tables in MySQL queries treated as a pseudo- standard behaviour rather than as behaviour needed only in certain situations?
Take the following example from the question linked above:
SELECT st.StudentName, cl.ClassName
FROM StudentClass sc
INNER JOIN Classes cl ON cl.ClassID = sc.ClassID
INNER JOIN Students st ON st.StudentID = sc.StudentID;
From my experience the alias for the tables is usually unneeded (some might say pseudo-random, spacing filling) letters and can just as easily, and more readably be:
SELECT Students.StudentName, Classes.ClassName
FROM StudentClass
INNER JOIN Classes ON Classes.ClassID = StudentClass.ClassID
INNER JOIN Students ON Students.StudentID = StudentClass.StudentID;
Obviously it may -in some situations- be better to use shortened naming convention, perhaps for a large query with many tables each of long names, but that's no reason (as far as I can see) for the absolute over the top prevelance of this methodology of forming an alias for each table, regardless of need.
I have googled this, but the majority of useful results state that it makes "the SQL more readable". That's as maybe for many or long-named tables in a Query as I've already state, but as an apparant standard??
Also, without the alias, it's clear to see the source table of each of the columns in the exampled SQL above.
I just want to see if there's some key methodology I'm completely missing here?
To qualify why I feel the need to ask if I'm missing something:
I see it an aweful lot on StackOverflow (as referenced here), which in itself means jack nothing, but then I see that there are no responses from knowledgable (high scoring) answerers that aliases are not needed (such as in the referenced post above), whereas other topics on SO those who deem themselves to know better (high scorers) are all over teling people how things should be done.
It leaves me unsure If I'm missing something, hence this question.
A comment by A MySQL authorized instructor and DBA circa 2010.
I think you may be mistaken that giving aliases to tables is somehow the standard.
There are two situations where it's required.
When the same table is joined more than once in a query. Aliases are needed to distinguish the two usages of the table.
A derived table (a subquery) needs an alias.
Other than that you can omit aliases to your heart's content.

Why would you use a JOIN, if you can select from multiple tables without a JOIN?

The following queries both select data from a posts table and a users table.
The first query uses a join the second doesn't... My question is why would you use a JOIN?
Query with JOIN:
SELECT u.*, p.* FROM users AS u
JOIN posts AS p ON p.user_id=u.user_id
WHERE u.user_id=1
Query without:
SELECT u.*, p.* FROM users AS u, posts AS p
WHERE p.user_id=u.user_id
AND u.user_id=1
The second form is called an implicit join. First and foremost, implicit joins are considered deprecated by most rdbmss. Personally, I sincerely doubt that any major RDBMS will drop support for them any time in the near future, but why take the risk?
Second, explicit joins have a standard way to perform outer joins. Implicit joins have all sorts of unreadable hacks solutions (like, e.g., Oracle's (+) syntax), but, as far as I know, nothing standard that has a reasonable expectancy of portability.
And third, and I admit this is purely a matter of taste, they just look better. Using explicit joins allows you to logically separate the conditions in the query to the "scaffolding" needed to join all the tables together and the actual logical conditions of the where clause. With implicit joins, everything just gets lumped into the where clause and with as little as three or four tables it becomes pretty hard to manage.
The second query is using a join. That's what the comma means in users AS u, posts AS p. This is an implicit join (implicit because although you're not explicitly using the JOIN keyword, you're getting its effects) also known as a CROSS JOIN, and means "every row of the left table, joined with every row in the right table".
The use of JOIN ... ON syntax is (in my opinion) much more explicit and readable, due in no small part to moving the joining condition from the WHERE clause to being directly attached to the JOIN, and also opens up the syntax for other join types (LEFT JOIN, the default, and INNER JOIN) with different semantics.
It is purely a choice of style. These two statements will be interpreted identically by the server.
When you use a comma (,) in the FROM clause, this is implicitly using a CROSS JOIN.
If you have lots of conditions in the WHERE clause, it may more clear to distinguish between conditions intended to connect tables (tableA.ID = tableB.tableA_id) versus conditions intended to filter. You can achieve this by putting the connection conditions next to an explicit JOIN.

MySQL - SELECT, JOIN

Few months ago I was programming a simple application with som other guy in PHP. There we needed to preform a SELECT from multiple tables based on a userid and another value that you needed to get from the row that was selected by userid.
My first idea was to create multiple SELECTs and parse all the output in the PHP script (with all that mysql_num_rows() and similar functions for checking), but then the guy told me he'll do that. "Okay no problem!" I thought, just much more less for me to write. Well, what a surprise when i found out he did it with just one SQL statement:
SELECT
d.uid AS uid, p.pasmo_cas AS pasmo, d.pasmo AS id_pasmo ...
FROM
table_values AS d, sectors AS p
WHERE
d.userid='$userid' and p.pasmo_id=d.pasmo
ORDER BY
datum DESC, p.pasmo_id DESC
(shortened piece of the statement (...))
Mostly I need to know the differences between this method (is it the right way to do this?) and JOIN - when should I use which one?
Also any references to explanations and examples of these two would come in pretty handy (not from the MySQL ref though - I'm really a novice in this kind of stuff and it's written pretty roughly there.)
, notation was replaced in ANSI-92 standard, and so is in one sense now 20 years out of date.
Also, when doing OUTER JOINs and other more complex queries, the JOIN notation is much more explicit, readable, and (in my opinion) debuggable.
As a general principle, avoid , and use JOIN.
In terms of precedence, a JOIN's ON clause happens before the WHERE clause. This allows things like a LEFT JOIN b ON a.id = b.id WHERE b.id IS NULL to check for cases where there is NOT a matching row in b.
Using , notation is similar to processing the WHERE and ON conditions at the same time.
This definitely looks like the ideal scenario for a join so you can avoid returning more data then you actually need. This: http://www.w3schools.com/sql/sql_join.asp or this: http://en.wikipedia.org/wiki/Join_(SQL) should help you get started with joins. I'm also happy to help you write the statement if you can give me a brief outline of the columns / data in each table (primarily I need two matching columns to join on).
The use of the WHERE clause is a valid approach, but as #Dems noted, has been superseded by the use of the JOINS syntax.
However, I would argue that in some cases, use of the WHERE clauses to achieve joins can be more readable and understandable than using JOINs.
You should make yourself familiar with both methods of joining tables.