A query about MySQL table aliases - mysql

So there's a question about MySQL aliases for table names, and it has raised me to ask this question here:
Why is the use of aliases in naming tables in MySQL queries treated as a pseudo- standard behaviour rather than as behaviour needed only in certain situations?
Take the following example from the question linked above:
SELECT st.StudentName, cl.ClassName
FROM StudentClass sc
INNER JOIN Classes cl ON cl.ClassID = sc.ClassID
INNER JOIN Students st ON st.StudentID = sc.StudentID;
From my experience the alias for the tables is usually unneeded (some might say pseudo-random, spacing filling) letters and can just as easily, and more readably be:
SELECT Students.StudentName, Classes.ClassName
FROM StudentClass
INNER JOIN Classes ON Classes.ClassID = StudentClass.ClassID
INNER JOIN Students ON Students.StudentID = StudentClass.StudentID;
Obviously it may -in some situations- be better to use shortened naming convention, perhaps for a large query with many tables each of long names, but that's no reason (as far as I can see) for the absolute over the top prevelance of this methodology of forming an alias for each table, regardless of need.
I have googled this, but the majority of useful results state that it makes "the SQL more readable". That's as maybe for many or long-named tables in a Query as I've already state, but as an apparant standard??
Also, without the alias, it's clear to see the source table of each of the columns in the exampled SQL above.
I just want to see if there's some key methodology I'm completely missing here?
To qualify why I feel the need to ask if I'm missing something:
I see it an aweful lot on StackOverflow (as referenced here), which in itself means jack nothing, but then I see that there are no responses from knowledgable (high scoring) answerers that aliases are not needed (such as in the referenced post above), whereas other topics on SO those who deem themselves to know better (high scorers) are all over teling people how things should be done.
It leaves me unsure If I'm missing something, hence this question.
A comment by A MySQL authorized instructor and DBA circa 2010.

I think you may be mistaken that giving aliases to tables is somehow the standard.
There are two situations where it's required.
When the same table is joined more than once in a query. Aliases are needed to distinguish the two usages of the table.
A derived table (a subquery) needs an alias.
Other than that you can omit aliases to your heart's content.

Related

Is this SQL statement making a join? [duplicate]

I develop against Oracle databases. When I need to manually write (not use an ORM like hibernate), I use a WHERE condition instead of a JOIN.
for example (this is simplistic just to illustrate the style):
Select *
from customers c, invoices i, shipment_info si
where c.customer_id = i.customer_id
and i.amount > 999.99
and i.invoice_id = si.invoice_id(+) -- added to show a replacement for a join
order by i.amount, c.name
I learned this style from an OLD oracle DBA. I have since learned that this is not standard SQL syntax. Other than being non-standard and much less database portable, are there any other repercussions to using this format?
I don't like the style because it makes it harder to determine which WHERE clauses are for simulating JOINs and which ones are for actual filters, and I don't like code that makes it unnecessarily difficult to determine the original intent of the programmer.
The biggest issue that I have run into with this format is the tendency to forget some join's WHERE clause, thereby resulting in a cartesian product. This is particularly common (for me, at least) when adding a new table to the query. For example, suppose an ADDRESSES table is thrown into the mix and your mind is a bit forgetful:
SELECT *
FROM customers c, invoices i, addresses a
WHERE c.customer_id = i.customer_id
AND i.amount > 999.99
ORDER BY i.amount, c.name
Boom! Cartesian product! :)
The old style join is flat out wrong in some cases (outer joins are the culprit). Although they are more or less equivalent when using inner joins, they can generate incorrect results with outer joins, especially if columns on the outer side can be null. This is because when using the older syntax the join conditions are not logically evaluated until the entire result set has been constructed, it is simply not possible to express a condition on a column from outer side of a join that will filter records when the column can be null because there is no matching record.
As an example:
Select all Customers, and the sum of the sales of Widgets on all their Invoices in the month Of August, where the Invoice has been processed (Invoice.ProcessDate is Not Null)
using new ANSI-92 Join syntax
Select c.name, Sum(d.Amount)
From customer c
Left Join Invoice I
On i.custId = c.custId
And i.SalesDate Between '8/1/2009'
and '8/31/2009 23:59:59'
And i.ProcessDate Is Not Null
Left Join InvoiceDetails d
On d.InvoiceId = i.InvoiceId
And d.Product = 'widget'
Group By c.Name
Try doing this with old syntax... Because when using the old style syntax, all the conditions in the where clause are evaluated/applied BEFORE the 'outer' rows are added back in, All the UnProcessed Invoice rows will get added back into the final result set... So this is not possible with old syntax - anything that attempts to filter out the invoices with null Processed Dates will eliminate customers... the only alternative is to use a correlated subquery.
Some people will say that this style is less readable, but that's a matter of habit. From a performance point of view, it doesn't matter, since the query optimizer takes care of that.
I have since learned that this is not standard SQL syntax.
That's not quite true. The "a,b where" syntax is from the ansi-89 standard, the "a join b on" syntax is ansi-92. However, the 89 syntax is deprecated, which means you should not use it for new queries.
Also, there are some situations where the older style lacks expressive power, especially with regard to outer joins or complex queries.
It can be a pain going through the where clause trying to pick out join conditions. For anything more than one join the old style is absolute evil. And once you know the new style, you may as well just keep using it.
This is a standard SQL syntax, just an older standard than JOIN. There's a reason that the syntax has evolved and you should use the newer JOIN syntax because:
It's more expressive, clearly indicating which tables are JOINed, the JOIN order, which conditions apply to which JOIN, and separating out the filtering WHERE conditions from the JOIN conditions.
It supports LEFT, RIGHT, and FULL OUTER JOINs, which the WHERE syntax does not.
I don't think you'll find the WHERE-type JOIN substantially less portable than the JOIN syntax.
As long as you don't use the ANSI natural join feature I'm OK with it.
I found this quote by – ScottCher, I totally agree:
I find the WHERE syntax easier to read than INNER JOIN - I guess its like Vegemite. Most people in the world probably find it disgusting but kids brought up eating it love it.
It really depends on habits, but I have always found Oracle's comma separated syntax more natural. The first reason is that I think using (INNER) JOIN diminishes readability. The second is about flexibility. In the end, a join is a cartesian product by definition. You do not necessarily have to restrict the results based on IDs of both tables. Although very seldom, one might well need cartesian product of two tables. Restricting them based on IDs is just a very reasonable practice, but NOT A RULE. However, if you use JOIN keyword in e.g. SQL Server, it won't let you omit the ON keyword. Suppose you want to create a combination list. You have to do like this:
SELECT *
FROM numbers
JOIN letters
ON 1=1
Apart from that, I find the (+) syntax of Oracle also very reasonable. It is a nice way to say, "Add this record to the resultset too, even if it is null." It is way better than the RIGHT/LEFT JOIN syntax, because in fact there is no left or right! When you want to join 10 tables with several different types of outer joins, it gets confusing which table is on the "left hand side" and which one on the right.
By the way, as a more general comment, I don't think SQL portability exists in the practical world any more. The standard SQL is so poor and the expressiveness of diverse DBMS specific syntax are so often demanded, I don't think 100% portable SQL code is an achievable goal. The most obvious evidence of my observation is the good old row number problemmatic. Just search any forum for "sql row number", including SO, and you will see hundreds of posts asking how it can be achieved in a specific DBMS. Similar and related to that, so is limiting the number of returned rows, for example..
This is Transact SQL syntax, and I'm not quite sure how "unportable" it is - it is the main syntax used in Sybase, for example (Sybase supports ANSI syntax as well) as well as many other databases (if not all).
The main benefits to ANSI syntax is that it allows you to write some fairly tricky chained joins that T-SQL prohibits
Speaking as someone who writes automated sql query transformers (inline view expansions, grafted joins, union factoring) and thinks of SQL as a data structure to manipulate: the non-JOIN syntax is far less pain to manipulate.
I can't speak to "harder to read" complaints; JOIN looks like an lunge toward relational algebra operators. Don't go there :-)
Actually, this syntax is more portable than a JOIN, because it will work with pretty much any database, whereas not everybody supports the JOIN syntax (Oracle Lite doesn't, for example [unless this has changed recently]).

What's the purpose of an IMPLICIT JOIN in SQL?

So, I don't really understand the purpose of using an implicit join in SQL. In my opinion, it makes a join more difficult to spot in the code, and I'm wondering this:
Is there a greater purpose for actually wanting to do this besides the simplicity of it?
Fundamentally there is no difference between the implicit join and the explicit JOIN .. ON ... Execution plans are the same.
I prefer the explicit notation as it makes it easier to read and debug.
Moreover, in the explicit notation you define the relationship between the tables in the ON clause and the search condition in the WHERE clause.
Explicit vs implicit SQL joins
When you join several tables no matter how the join condition written, anyway optimizer will choose execution plan it consider the best. As for me:
1) Implicit join syntax is more concise.
2) It easier to generate it automatically, or produce using other SQL script.
So I use it sometimes.
Others have answered the question from the perspective of what most people understand by "implicit JOIN", an INNER JOIN that arises from table lists with join predicates in the WHERE clause. However, I think it's worth mentioning also the concept of an "implicit JOIN" as some ORM query languages understand it, such as Hibernate's HQL or jOOQ or Doctrine and probably others. In those cases, the join is expessed as a path expression anywhere in the query, such as e.g.
SELECT
b.author.first_name,
b.author.last_name,
b.title,
b.language.cd AS language
FROM book b;
Where the path b.author implicitly joins the AUTHOR table to the BOOK table using the foreign key between the two tables. Your question still holds for this type of "implicit join" as well, and the answer is the same, some users may find this syntax more convenient than the explicit one. There is no other advantage to it.
Disclaimer: I work for the company behind jOOQ.

SQL "Table_Name.Column_name" VS "Column_name" performance & syntax

I'm new to SQL and am currently working through a "teach yourself SQL book"
It was mentioned in the book that sometimes you NEED to specify table name with column name (immediately after SELECT line) to get your desired result. It was also mentioned that it is often good practice to do this regardless. Here is a specific example:
SELECT vend_name, prod_name, prod_price
FROM Vendors, Products
WHERE Vendors.vend_id = Products.vend_id;
SELECT Vendors.vend_name, Products.prod_name, Products.prod_price
FROM Vendors, Products
WHERE Vendors.vend_id = Products.vend_id;
Both code blocks achieve the same result. My question is whether there is a performance difference, and if the full names are better practice.
Thanks in advance.
First, learn proper join syntax. Simple rule: Never use commas in the from clause.
Second, learn to use table aliases. These should be abbreviations for the table. Table aliases make queries easier to write and to read.
Third, always use qualified column names. Using the column name has no effect on performance. Oh, perhaps you'll make an exception if you have only one table or something like that. But, including the table alias is a very good idea, a best practice. Why?
You or someone else may look at the query in the future and not want to figure out which names come from which tables.
You or someone else may add a new column to one of the tables that matches a column in the other. And, the query mysteriously stops working.
You or someone else may say "what a great query, but I need to add another table". The other table has naming conflicts, just introducing more work.
So, I would write the query as:
SELECT v.vend_name, p.prod_name, p.prod_price
FROM Vendors v JOIN
Products p
ON v.vend_id = p.vend_id;
Or, if you like:
SELECT v.vend_name, p.prod_name, p.prod_price
FROM Vendors v JOIN
Products p
USING (vend_id)
Below format is helpful if you have multiple tables with same column names,in order to reduce the confusions
Vendors.vend_name
Let me be simple and short :
There wont be any performance issues but it is a good practice
to follow
Firstly I doubt there would be any performance difference. However even if there was It would not be worth it in the long run.
The table.column is good practice as it makes your sql easier to read. This may not seam like a big deal when your learning but you will most likely come across SQL statements that are huge and when you do you will be glad of this practice.
I would recommend you look at the AS keyword. This will allow you to assign an Alias to a table to again make the statement easier to understand.
SELECT p.id, v.name
FROM products AS p
JOIN venders as V
ON p.id = v.ProductId
This allows you to keep the SELECT section of your SQL Statement as short as possible and avoid repeating product. product. etc for every field you want to show.

MySQL - SELECT, JOIN

Few months ago I was programming a simple application with som other guy in PHP. There we needed to preform a SELECT from multiple tables based on a userid and another value that you needed to get from the row that was selected by userid.
My first idea was to create multiple SELECTs and parse all the output in the PHP script (with all that mysql_num_rows() and similar functions for checking), but then the guy told me he'll do that. "Okay no problem!" I thought, just much more less for me to write. Well, what a surprise when i found out he did it with just one SQL statement:
SELECT
d.uid AS uid, p.pasmo_cas AS pasmo, d.pasmo AS id_pasmo ...
FROM
table_values AS d, sectors AS p
WHERE
d.userid='$userid' and p.pasmo_id=d.pasmo
ORDER BY
datum DESC, p.pasmo_id DESC
(shortened piece of the statement (...))
Mostly I need to know the differences between this method (is it the right way to do this?) and JOIN - when should I use which one?
Also any references to explanations and examples of these two would come in pretty handy (not from the MySQL ref though - I'm really a novice in this kind of stuff and it's written pretty roughly there.)
, notation was replaced in ANSI-92 standard, and so is in one sense now 20 years out of date.
Also, when doing OUTER JOINs and other more complex queries, the JOIN notation is much more explicit, readable, and (in my opinion) debuggable.
As a general principle, avoid , and use JOIN.
In terms of precedence, a JOIN's ON clause happens before the WHERE clause. This allows things like a LEFT JOIN b ON a.id = b.id WHERE b.id IS NULL to check for cases where there is NOT a matching row in b.
Using , notation is similar to processing the WHERE and ON conditions at the same time.
This definitely looks like the ideal scenario for a join so you can avoid returning more data then you actually need. This: http://www.w3schools.com/sql/sql_join.asp or this: http://en.wikipedia.org/wiki/Join_(SQL) should help you get started with joins. I'm also happy to help you write the statement if you can give me a brief outline of the columns / data in each table (primarily I need two matching columns to join on).
The use of the WHERE clause is a valid approach, but as #Dems noted, has been superseded by the use of the JOINS syntax.
However, I would argue that in some cases, use of the WHERE clauses to achieve joins can be more readable and understandable than using JOINs.
You should make yourself familiar with both methods of joining tables.

How does JOIN work in MySQL?

Although the question title is duplicate of many discussions, I did not find a answer to this question:
Consider a simple join for normalized tables of tags as
SELECT tags.tag
FROM tags
INNER JOIN tag_map
ON tags.tag_id=tag_map.tag_id
WHERE article_id=xx
Does JOIN work with the entire tables of tags and tag_map then filter the created (JOINed) table to find rows with WHERE clause for the article id
OR JOIN will only join rows of tag_map table in which article_id=xx ?
The latter method should be quite faster!
It will do the former, to my knowledge WHERE's are explicitly performed on the resulting JOINed table. (Disclaimer: MySQL may optimize this in some cases, I don't know).
To force the latter behaviour and execute the WHERE first, you can add an extra filter to your JOIN ON statement:
SELECT tags.tag
FROM tags
INNER JOIN tag_map
ON tags.article_id=xx
AND tags.tag_id=tag_map.tag_id
WHERE article_id=xx
The Joins work on ONLY those records qualified from the WHERE clause of the first table returning records.. That said, you are doing a join to tag_map, but your where clause does not specify which alias the "Article_ID" is associated with. Its typically better to always qualify your fields with either the table name or alias the are coming from.
So, if article_id is coming from TAGS, then it will first look at that list as the primary set of records, and optimized with index if one so exists and return a small set. From that, the join is applied to the tag_map and will grab all records that match the join "ON" condition.
Just to clarify something. If the JOIN was applied FIRST, before the WHERE clause optimization, queries would take forever. The join basically PREPARES the relationship before the record selection actually occurs. Hence, the execution plan that shows the indexes that would be used.
It depends on the engine. Earlier version of many database engines would generate the join results first and then it would filter. Newer versions of engines generate a execution plan that achieves the fastest results. Test would have to be done with the db engine reviewing execution plans for your version/database to find "what is best"
Assuming it is simple or inner join:
The answer is: in relational model, first answer is correct, it creates a table that contains every row from first crossed with every row from the second table, so if you have N rows in first and M in second, it will create a table with NxM and then eliminate those where conditions do not match.
Now, that is mathematical model, but in implementation, depending on the engine, it will use some smarter way, typically choosing one table that seems faster and jioing from there using hopefully indexed join field. But this depends between engines: there is a lot of documetnation on that(google it) and some people, poster of this answer included, are paid to optimize join queries...
In case of MYSQL (just noticed the tag) you can use following syntax:
EXPLAIN [EXTENDED] SELECT select_options
as explained here and MYSQL will tell you how it would execute such query. It is faster then reading the docuemtnation.
You can always check Execution plan to see how your query gets executed step by step. In MySQL I don't know whether it can be presented graphically using any third party tools (as you can on MS SQL out of the box with Management Studio) but you can still check it using explain language constructs. Check documentation.
Not knowing your table schema
If article_id is of table tags then tag_map table isn't scanned at all unless join column in FK table is nullable.
If article_id is indexed (ie. primary key) then index is being scanned...
etc...
What I'd like to say is that we'd need your table schema definition to tell you some details. We can't know how your schema works.