MySQL - SELECT, JOIN - mysql

Few months ago I was programming a simple application with som other guy in PHP. There we needed to preform a SELECT from multiple tables based on a userid and another value that you needed to get from the row that was selected by userid.
My first idea was to create multiple SELECTs and parse all the output in the PHP script (with all that mysql_num_rows() and similar functions for checking), but then the guy told me he'll do that. "Okay no problem!" I thought, just much more less for me to write. Well, what a surprise when i found out he did it with just one SQL statement:
SELECT
d.uid AS uid, p.pasmo_cas AS pasmo, d.pasmo AS id_pasmo ...
FROM
table_values AS d, sectors AS p
WHERE
d.userid='$userid' and p.pasmo_id=d.pasmo
ORDER BY
datum DESC, p.pasmo_id DESC
(shortened piece of the statement (...))
Mostly I need to know the differences between this method (is it the right way to do this?) and JOIN - when should I use which one?
Also any references to explanations and examples of these two would come in pretty handy (not from the MySQL ref though - I'm really a novice in this kind of stuff and it's written pretty roughly there.)

, notation was replaced in ANSI-92 standard, and so is in one sense now 20 years out of date.
Also, when doing OUTER JOINs and other more complex queries, the JOIN notation is much more explicit, readable, and (in my opinion) debuggable.
As a general principle, avoid , and use JOIN.
In terms of precedence, a JOIN's ON clause happens before the WHERE clause. This allows things like a LEFT JOIN b ON a.id = b.id WHERE b.id IS NULL to check for cases where there is NOT a matching row in b.
Using , notation is similar to processing the WHERE and ON conditions at the same time.

This definitely looks like the ideal scenario for a join so you can avoid returning more data then you actually need. This: http://www.w3schools.com/sql/sql_join.asp or this: http://en.wikipedia.org/wiki/Join_(SQL) should help you get started with joins. I'm also happy to help you write the statement if you can give me a brief outline of the columns / data in each table (primarily I need two matching columns to join on).

The use of the WHERE clause is a valid approach, but as #Dems noted, has been superseded by the use of the JOINS syntax.
However, I would argue that in some cases, use of the WHERE clauses to achieve joins can be more readable and understandable than using JOINs.
You should make yourself familiar with both methods of joining tables.

Related

Is this SQL statement making a join? [duplicate]

I develop against Oracle databases. When I need to manually write (not use an ORM like hibernate), I use a WHERE condition instead of a JOIN.
for example (this is simplistic just to illustrate the style):
Select *
from customers c, invoices i, shipment_info si
where c.customer_id = i.customer_id
and i.amount > 999.99
and i.invoice_id = si.invoice_id(+) -- added to show a replacement for a join
order by i.amount, c.name
I learned this style from an OLD oracle DBA. I have since learned that this is not standard SQL syntax. Other than being non-standard and much less database portable, are there any other repercussions to using this format?
I don't like the style because it makes it harder to determine which WHERE clauses are for simulating JOINs and which ones are for actual filters, and I don't like code that makes it unnecessarily difficult to determine the original intent of the programmer.
The biggest issue that I have run into with this format is the tendency to forget some join's WHERE clause, thereby resulting in a cartesian product. This is particularly common (for me, at least) when adding a new table to the query. For example, suppose an ADDRESSES table is thrown into the mix and your mind is a bit forgetful:
SELECT *
FROM customers c, invoices i, addresses a
WHERE c.customer_id = i.customer_id
AND i.amount > 999.99
ORDER BY i.amount, c.name
Boom! Cartesian product! :)
The old style join is flat out wrong in some cases (outer joins are the culprit). Although they are more or less equivalent when using inner joins, they can generate incorrect results with outer joins, especially if columns on the outer side can be null. This is because when using the older syntax the join conditions are not logically evaluated until the entire result set has been constructed, it is simply not possible to express a condition on a column from outer side of a join that will filter records when the column can be null because there is no matching record.
As an example:
Select all Customers, and the sum of the sales of Widgets on all their Invoices in the month Of August, where the Invoice has been processed (Invoice.ProcessDate is Not Null)
using new ANSI-92 Join syntax
Select c.name, Sum(d.Amount)
From customer c
Left Join Invoice I
On i.custId = c.custId
And i.SalesDate Between '8/1/2009'
and '8/31/2009 23:59:59'
And i.ProcessDate Is Not Null
Left Join InvoiceDetails d
On d.InvoiceId = i.InvoiceId
And d.Product = 'widget'
Group By c.Name
Try doing this with old syntax... Because when using the old style syntax, all the conditions in the where clause are evaluated/applied BEFORE the 'outer' rows are added back in, All the UnProcessed Invoice rows will get added back into the final result set... So this is not possible with old syntax - anything that attempts to filter out the invoices with null Processed Dates will eliminate customers... the only alternative is to use a correlated subquery.
Some people will say that this style is less readable, but that's a matter of habit. From a performance point of view, it doesn't matter, since the query optimizer takes care of that.
I have since learned that this is not standard SQL syntax.
That's not quite true. The "a,b where" syntax is from the ansi-89 standard, the "a join b on" syntax is ansi-92. However, the 89 syntax is deprecated, which means you should not use it for new queries.
Also, there are some situations where the older style lacks expressive power, especially with regard to outer joins or complex queries.
It can be a pain going through the where clause trying to pick out join conditions. For anything more than one join the old style is absolute evil. And once you know the new style, you may as well just keep using it.
This is a standard SQL syntax, just an older standard than JOIN. There's a reason that the syntax has evolved and you should use the newer JOIN syntax because:
It's more expressive, clearly indicating which tables are JOINed, the JOIN order, which conditions apply to which JOIN, and separating out the filtering WHERE conditions from the JOIN conditions.
It supports LEFT, RIGHT, and FULL OUTER JOINs, which the WHERE syntax does not.
I don't think you'll find the WHERE-type JOIN substantially less portable than the JOIN syntax.
As long as you don't use the ANSI natural join feature I'm OK with it.
I found this quote by – ScottCher, I totally agree:
I find the WHERE syntax easier to read than INNER JOIN - I guess its like Vegemite. Most people in the world probably find it disgusting but kids brought up eating it love it.
It really depends on habits, but I have always found Oracle's comma separated syntax more natural. The first reason is that I think using (INNER) JOIN diminishes readability. The second is about flexibility. In the end, a join is a cartesian product by definition. You do not necessarily have to restrict the results based on IDs of both tables. Although very seldom, one might well need cartesian product of two tables. Restricting them based on IDs is just a very reasonable practice, but NOT A RULE. However, if you use JOIN keyword in e.g. SQL Server, it won't let you omit the ON keyword. Suppose you want to create a combination list. You have to do like this:
SELECT *
FROM numbers
JOIN letters
ON 1=1
Apart from that, I find the (+) syntax of Oracle also very reasonable. It is a nice way to say, "Add this record to the resultset too, even if it is null." It is way better than the RIGHT/LEFT JOIN syntax, because in fact there is no left or right! When you want to join 10 tables with several different types of outer joins, it gets confusing which table is on the "left hand side" and which one on the right.
By the way, as a more general comment, I don't think SQL portability exists in the practical world any more. The standard SQL is so poor and the expressiveness of diverse DBMS specific syntax are so often demanded, I don't think 100% portable SQL code is an achievable goal. The most obvious evidence of my observation is the good old row number problemmatic. Just search any forum for "sql row number", including SO, and you will see hundreds of posts asking how it can be achieved in a specific DBMS. Similar and related to that, so is limiting the number of returned rows, for example..
This is Transact SQL syntax, and I'm not quite sure how "unportable" it is - it is the main syntax used in Sybase, for example (Sybase supports ANSI syntax as well) as well as many other databases (if not all).
The main benefits to ANSI syntax is that it allows you to write some fairly tricky chained joins that T-SQL prohibits
Speaking as someone who writes automated sql query transformers (inline view expansions, grafted joins, union factoring) and thinks of SQL as a data structure to manipulate: the non-JOIN syntax is far less pain to manipulate.
I can't speak to "harder to read" complaints; JOIN looks like an lunge toward relational algebra operators. Don't go there :-)
Actually, this syntax is more portable than a JOIN, because it will work with pretty much any database, whereas not everybody supports the JOIN syntax (Oracle Lite doesn't, for example [unless this has changed recently]).

Filtering Condition in n-Table Joins

We seem to have a need for a multi-table JOIN operation and I am referring to some notes from an RDBMS class that I took several years ago. In this class the instructor graphically depicted the structure of a generic N-table JOIN query.
The figure seems to conform to examples of multi-table JOINs that I have seen but I have a question. Does the WHERE clause, for providing filtering, necessarily have to be the last clause in the query? Intuitively it appears that we can impose filtering conditions before a following JOIN clause, in order to properly scope the data, before we input it to the next JOIN operation.
Syntactically, the where clause has to be at the end. But the query plan will take it into account and use it to filter wherever possible. Note that just because you specify the from and joins in a given order doesn't mean the query will actually execute that way; it may rearrange them to whatever order it thinks will work best (unless you specify straight_join).
That said, having the where at the end does make some queries actually harder to read.
SQL queries consist of a sequence of clauses. The diagram you have is rather misleading. Common clauses -- and the order they must appear for a valid query -- are:
SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY
Note that JOIN is not a clause. It is an operator, and an operator that specifically appears only in the FROM clause.
So, the answer to your question is that WHERE clauses immediately follow the FROM clause. The only "sort-of" exception is when a "window" clause is included and that is syntactically between the FROM and the WHERE.
Next, multiple table joins are often quite efficient and there is no reason whatsoever to discourage their use. Support for joins, in fact, is one of the key design features that databases are designed around.
And finally. What actually gets executed is not the string that you create. A query, in fact, describes the result set you want. It does not describe the processing. SQL is a descriptive language, not a procedural language.
The SQL engine has two steps to convert your query string to an executable form (typically a directed acyclic graph). One is to compile the query, and the second is to optimize the query. So, where filtering actually occurs . . . that depends on what the optimizer decides. And where it occurs has little relationship to what you think of when you think of SQL queries (DAGs don't generally have nodes called "select" or "join").

What's the purpose of an IMPLICIT JOIN in SQL?

So, I don't really understand the purpose of using an implicit join in SQL. In my opinion, it makes a join more difficult to spot in the code, and I'm wondering this:
Is there a greater purpose for actually wanting to do this besides the simplicity of it?
Fundamentally there is no difference between the implicit join and the explicit JOIN .. ON ... Execution plans are the same.
I prefer the explicit notation as it makes it easier to read and debug.
Moreover, in the explicit notation you define the relationship between the tables in the ON clause and the search condition in the WHERE clause.
Explicit vs implicit SQL joins
When you join several tables no matter how the join condition written, anyway optimizer will choose execution plan it consider the best. As for me:
1) Implicit join syntax is more concise.
2) It easier to generate it automatically, or produce using other SQL script.
So I use it sometimes.
Others have answered the question from the perspective of what most people understand by "implicit JOIN", an INNER JOIN that arises from table lists with join predicates in the WHERE clause. However, I think it's worth mentioning also the concept of an "implicit JOIN" as some ORM query languages understand it, such as Hibernate's HQL or jOOQ or Doctrine and probably others. In those cases, the join is expessed as a path expression anywhere in the query, such as e.g.
SELECT
b.author.first_name,
b.author.last_name,
b.title,
b.language.cd AS language
FROM book b;
Where the path b.author implicitly joins the AUTHOR table to the BOOK table using the foreign key between the two tables. Your question still holds for this type of "implicit join" as well, and the answer is the same, some users may find this syntax more convenient than the explicit one. There is no other advantage to it.
Disclaimer: I work for the company behind jOOQ.

SQL "Table_Name.Column_name" VS "Column_name" performance & syntax

I'm new to SQL and am currently working through a "teach yourself SQL book"
It was mentioned in the book that sometimes you NEED to specify table name with column name (immediately after SELECT line) to get your desired result. It was also mentioned that it is often good practice to do this regardless. Here is a specific example:
SELECT vend_name, prod_name, prod_price
FROM Vendors, Products
WHERE Vendors.vend_id = Products.vend_id;
SELECT Vendors.vend_name, Products.prod_name, Products.prod_price
FROM Vendors, Products
WHERE Vendors.vend_id = Products.vend_id;
Both code blocks achieve the same result. My question is whether there is a performance difference, and if the full names are better practice.
Thanks in advance.
First, learn proper join syntax. Simple rule: Never use commas in the from clause.
Second, learn to use table aliases. These should be abbreviations for the table. Table aliases make queries easier to write and to read.
Third, always use qualified column names. Using the column name has no effect on performance. Oh, perhaps you'll make an exception if you have only one table or something like that. But, including the table alias is a very good idea, a best practice. Why?
You or someone else may look at the query in the future and not want to figure out which names come from which tables.
You or someone else may add a new column to one of the tables that matches a column in the other. And, the query mysteriously stops working.
You or someone else may say "what a great query, but I need to add another table". The other table has naming conflicts, just introducing more work.
So, I would write the query as:
SELECT v.vend_name, p.prod_name, p.prod_price
FROM Vendors v JOIN
Products p
ON v.vend_id = p.vend_id;
Or, if you like:
SELECT v.vend_name, p.prod_name, p.prod_price
FROM Vendors v JOIN
Products p
USING (vend_id)
Below format is helpful if you have multiple tables with same column names,in order to reduce the confusions
Vendors.vend_name
Let me be simple and short :
There wont be any performance issues but it is a good practice
to follow
Firstly I doubt there would be any performance difference. However even if there was It would not be worth it in the long run.
The table.column is good practice as it makes your sql easier to read. This may not seam like a big deal when your learning but you will most likely come across SQL statements that are huge and when you do you will be glad of this practice.
I would recommend you look at the AS keyword. This will allow you to assign an Alias to a table to again make the statement easier to understand.
SELECT p.id, v.name
FROM products AS p
JOIN venders as V
ON p.id = v.ProductId
This allows you to keep the SELECT section of your SQL Statement as short as possible and avoid repeating product. product. etc for every field you want to show.

Does it make a difference whether I put WHERE conditions in the WHERE or Join clause, unrelated to the join?

For example:
SELECT *
FROM a
JOIN b ON a.b_id = b.id
AND b.col = 'something'
vs
SELECT *
FROM a
JOIN b ON a.b_id = b.id
WHERE b.col = 'something'
I would assume that MySQL's query optimizer would regard this the same query. Are they the same in all such cases, whether the WHERE column is on table a or table b?
These queries will be handled the same way by MySQL. You can verify this by placing EXPLAIN EXTENDED in front of either query and looking over the Query Execution Plan. If you need a good resource for understanding the output of the EXPLAIN query, check out http://www.sitepoint.com/using-explain-to-write-better-mysql-queries/ (if that link ever breaks, search the web for "Understanding MySQL Explain" and you'll have no trouble finding a resource.
In general, I would recommend the second form you used. It's not so much a matter of technical reasons related to query execution as it is related to ease of modifying code later on, when/if you need to. For example, suppose you added four or five more joins. It would be difficult to read this query if WHERE clauses were sprinkled all over the place.
Keeping your join clauses and your WHERE filters separate is definitely something I would consider a best practice, but it's to do with ease of reading/editing, not because you're gonna end up with a different query execution plan (unless you make a mistake, which I think most people would be more likely to do with the first query as opposed to the second)