I'm new to SQL and am currently working through a "teach yourself SQL book"
It was mentioned in the book that sometimes you NEED to specify table name with column name (immediately after SELECT line) to get your desired result. It was also mentioned that it is often good practice to do this regardless. Here is a specific example:
SELECT vend_name, prod_name, prod_price
FROM Vendors, Products
WHERE Vendors.vend_id = Products.vend_id;
SELECT Vendors.vend_name, Products.prod_name, Products.prod_price
FROM Vendors, Products
WHERE Vendors.vend_id = Products.vend_id;
Both code blocks achieve the same result. My question is whether there is a performance difference, and if the full names are better practice.
Thanks in advance.
First, learn proper join syntax. Simple rule: Never use commas in the from clause.
Second, learn to use table aliases. These should be abbreviations for the table. Table aliases make queries easier to write and to read.
Third, always use qualified column names. Using the column name has no effect on performance. Oh, perhaps you'll make an exception if you have only one table or something like that. But, including the table alias is a very good idea, a best practice. Why?
You or someone else may look at the query in the future and not want to figure out which names come from which tables.
You or someone else may add a new column to one of the tables that matches a column in the other. And, the query mysteriously stops working.
You or someone else may say "what a great query, but I need to add another table". The other table has naming conflicts, just introducing more work.
So, I would write the query as:
SELECT v.vend_name, p.prod_name, p.prod_price
FROM Vendors v JOIN
Products p
ON v.vend_id = p.vend_id;
Or, if you like:
SELECT v.vend_name, p.prod_name, p.prod_price
FROM Vendors v JOIN
Products p
USING (vend_id)
Below format is helpful if you have multiple tables with same column names,in order to reduce the confusions
Vendors.vend_name
Let me be simple and short :
There wont be any performance issues but it is a good practice
to follow
Firstly I doubt there would be any performance difference. However even if there was It would not be worth it in the long run.
The table.column is good practice as it makes your sql easier to read. This may not seam like a big deal when your learning but you will most likely come across SQL statements that are huge and when you do you will be glad of this practice.
I would recommend you look at the AS keyword. This will allow you to assign an Alias to a table to again make the statement easier to understand.
SELECT p.id, v.name
FROM products AS p
JOIN venders as V
ON p.id = v.ProductId
This allows you to keep the SELECT section of your SQL Statement as short as possible and avoid repeating product. product. etc for every field you want to show.
Related
I develop against Oracle databases. When I need to manually write (not use an ORM like hibernate), I use a WHERE condition instead of a JOIN.
for example (this is simplistic just to illustrate the style):
Select *
from customers c, invoices i, shipment_info si
where c.customer_id = i.customer_id
and i.amount > 999.99
and i.invoice_id = si.invoice_id(+) -- added to show a replacement for a join
order by i.amount, c.name
I learned this style from an OLD oracle DBA. I have since learned that this is not standard SQL syntax. Other than being non-standard and much less database portable, are there any other repercussions to using this format?
I don't like the style because it makes it harder to determine which WHERE clauses are for simulating JOINs and which ones are for actual filters, and I don't like code that makes it unnecessarily difficult to determine the original intent of the programmer.
The biggest issue that I have run into with this format is the tendency to forget some join's WHERE clause, thereby resulting in a cartesian product. This is particularly common (for me, at least) when adding a new table to the query. For example, suppose an ADDRESSES table is thrown into the mix and your mind is a bit forgetful:
SELECT *
FROM customers c, invoices i, addresses a
WHERE c.customer_id = i.customer_id
AND i.amount > 999.99
ORDER BY i.amount, c.name
Boom! Cartesian product! :)
The old style join is flat out wrong in some cases (outer joins are the culprit). Although they are more or less equivalent when using inner joins, they can generate incorrect results with outer joins, especially if columns on the outer side can be null. This is because when using the older syntax the join conditions are not logically evaluated until the entire result set has been constructed, it is simply not possible to express a condition on a column from outer side of a join that will filter records when the column can be null because there is no matching record.
As an example:
Select all Customers, and the sum of the sales of Widgets on all their Invoices in the month Of August, where the Invoice has been processed (Invoice.ProcessDate is Not Null)
using new ANSI-92 Join syntax
Select c.name, Sum(d.Amount)
From customer c
Left Join Invoice I
On i.custId = c.custId
And i.SalesDate Between '8/1/2009'
and '8/31/2009 23:59:59'
And i.ProcessDate Is Not Null
Left Join InvoiceDetails d
On d.InvoiceId = i.InvoiceId
And d.Product = 'widget'
Group By c.Name
Try doing this with old syntax... Because when using the old style syntax, all the conditions in the where clause are evaluated/applied BEFORE the 'outer' rows are added back in, All the UnProcessed Invoice rows will get added back into the final result set... So this is not possible with old syntax - anything that attempts to filter out the invoices with null Processed Dates will eliminate customers... the only alternative is to use a correlated subquery.
Some people will say that this style is less readable, but that's a matter of habit. From a performance point of view, it doesn't matter, since the query optimizer takes care of that.
I have since learned that this is not standard SQL syntax.
That's not quite true. The "a,b where" syntax is from the ansi-89 standard, the "a join b on" syntax is ansi-92. However, the 89 syntax is deprecated, which means you should not use it for new queries.
Also, there are some situations where the older style lacks expressive power, especially with regard to outer joins or complex queries.
It can be a pain going through the where clause trying to pick out join conditions. For anything more than one join the old style is absolute evil. And once you know the new style, you may as well just keep using it.
This is a standard SQL syntax, just an older standard than JOIN. There's a reason that the syntax has evolved and you should use the newer JOIN syntax because:
It's more expressive, clearly indicating which tables are JOINed, the JOIN order, which conditions apply to which JOIN, and separating out the filtering WHERE conditions from the JOIN conditions.
It supports LEFT, RIGHT, and FULL OUTER JOINs, which the WHERE syntax does not.
I don't think you'll find the WHERE-type JOIN substantially less portable than the JOIN syntax.
As long as you don't use the ANSI natural join feature I'm OK with it.
I found this quote by – ScottCher, I totally agree:
I find the WHERE syntax easier to read than INNER JOIN - I guess its like Vegemite. Most people in the world probably find it disgusting but kids brought up eating it love it.
It really depends on habits, but I have always found Oracle's comma separated syntax more natural. The first reason is that I think using (INNER) JOIN diminishes readability. The second is about flexibility. In the end, a join is a cartesian product by definition. You do not necessarily have to restrict the results based on IDs of both tables. Although very seldom, one might well need cartesian product of two tables. Restricting them based on IDs is just a very reasonable practice, but NOT A RULE. However, if you use JOIN keyword in e.g. SQL Server, it won't let you omit the ON keyword. Suppose you want to create a combination list. You have to do like this:
SELECT *
FROM numbers
JOIN letters
ON 1=1
Apart from that, I find the (+) syntax of Oracle also very reasonable. It is a nice way to say, "Add this record to the resultset too, even if it is null." It is way better than the RIGHT/LEFT JOIN syntax, because in fact there is no left or right! When you want to join 10 tables with several different types of outer joins, it gets confusing which table is on the "left hand side" and which one on the right.
By the way, as a more general comment, I don't think SQL portability exists in the practical world any more. The standard SQL is so poor and the expressiveness of diverse DBMS specific syntax are so often demanded, I don't think 100% portable SQL code is an achievable goal. The most obvious evidence of my observation is the good old row number problemmatic. Just search any forum for "sql row number", including SO, and you will see hundreds of posts asking how it can be achieved in a specific DBMS. Similar and related to that, so is limiting the number of returned rows, for example..
This is Transact SQL syntax, and I'm not quite sure how "unportable" it is - it is the main syntax used in Sybase, for example (Sybase supports ANSI syntax as well) as well as many other databases (if not all).
The main benefits to ANSI syntax is that it allows you to write some fairly tricky chained joins that T-SQL prohibits
Speaking as someone who writes automated sql query transformers (inline view expansions, grafted joins, union factoring) and thinks of SQL as a data structure to manipulate: the non-JOIN syntax is far less pain to manipulate.
I can't speak to "harder to read" complaints; JOIN looks like an lunge toward relational algebra operators. Don't go there :-)
Actually, this syntax is more portable than a JOIN, because it will work with pretty much any database, whereas not everybody supports the JOIN syntax (Oracle Lite doesn't, for example [unless this has changed recently]).
So there's a question about MySQL aliases for table names, and it has raised me to ask this question here:
Why is the use of aliases in naming tables in MySQL queries treated as a pseudo- standard behaviour rather than as behaviour needed only in certain situations?
Take the following example from the question linked above:
SELECT st.StudentName, cl.ClassName
FROM StudentClass sc
INNER JOIN Classes cl ON cl.ClassID = sc.ClassID
INNER JOIN Students st ON st.StudentID = sc.StudentID;
From my experience the alias for the tables is usually unneeded (some might say pseudo-random, spacing filling) letters and can just as easily, and more readably be:
SELECT Students.StudentName, Classes.ClassName
FROM StudentClass
INNER JOIN Classes ON Classes.ClassID = StudentClass.ClassID
INNER JOIN Students ON Students.StudentID = StudentClass.StudentID;
Obviously it may -in some situations- be better to use shortened naming convention, perhaps for a large query with many tables each of long names, but that's no reason (as far as I can see) for the absolute over the top prevelance of this methodology of forming an alias for each table, regardless of need.
I have googled this, but the majority of useful results state that it makes "the SQL more readable". That's as maybe for many or long-named tables in a Query as I've already state, but as an apparant standard??
Also, without the alias, it's clear to see the source table of each of the columns in the exampled SQL above.
I just want to see if there's some key methodology I'm completely missing here?
To qualify why I feel the need to ask if I'm missing something:
I see it an aweful lot on StackOverflow (as referenced here), which in itself means jack nothing, but then I see that there are no responses from knowledgable (high scoring) answerers that aliases are not needed (such as in the referenced post above), whereas other topics on SO those who deem themselves to know better (high scorers) are all over teling people how things should be done.
It leaves me unsure If I'm missing something, hence this question.
A comment by A MySQL authorized instructor and DBA circa 2010.
I think you may be mistaken that giving aliases to tables is somehow the standard.
There are two situations where it's required.
When the same table is joined more than once in a query. Aliases are needed to distinguish the two usages of the table.
A derived table (a subquery) needs an alias.
Other than that you can omit aliases to your heart's content.
Few months ago I was programming a simple application with som other guy in PHP. There we needed to preform a SELECT from multiple tables based on a userid and another value that you needed to get from the row that was selected by userid.
My first idea was to create multiple SELECTs and parse all the output in the PHP script (with all that mysql_num_rows() and similar functions for checking), but then the guy told me he'll do that. "Okay no problem!" I thought, just much more less for me to write. Well, what a surprise when i found out he did it with just one SQL statement:
SELECT
d.uid AS uid, p.pasmo_cas AS pasmo, d.pasmo AS id_pasmo ...
FROM
table_values AS d, sectors AS p
WHERE
d.userid='$userid' and p.pasmo_id=d.pasmo
ORDER BY
datum DESC, p.pasmo_id DESC
(shortened piece of the statement (...))
Mostly I need to know the differences between this method (is it the right way to do this?) and JOIN - when should I use which one?
Also any references to explanations and examples of these two would come in pretty handy (not from the MySQL ref though - I'm really a novice in this kind of stuff and it's written pretty roughly there.)
, notation was replaced in ANSI-92 standard, and so is in one sense now 20 years out of date.
Also, when doing OUTER JOINs and other more complex queries, the JOIN notation is much more explicit, readable, and (in my opinion) debuggable.
As a general principle, avoid , and use JOIN.
In terms of precedence, a JOIN's ON clause happens before the WHERE clause. This allows things like a LEFT JOIN b ON a.id = b.id WHERE b.id IS NULL to check for cases where there is NOT a matching row in b.
Using , notation is similar to processing the WHERE and ON conditions at the same time.
This definitely looks like the ideal scenario for a join so you can avoid returning more data then you actually need. This: http://www.w3schools.com/sql/sql_join.asp or this: http://en.wikipedia.org/wiki/Join_(SQL) should help you get started with joins. I'm also happy to help you write the statement if you can give me a brief outline of the columns / data in each table (primarily I need two matching columns to join on).
The use of the WHERE clause is a valid approach, but as #Dems noted, has been superseded by the use of the JOINS syntax.
However, I would argue that in some cases, use of the WHERE clauses to achieve joins can be more readable and understandable than using JOINs.
You should make yourself familiar with both methods of joining tables.
Consider this a theoretical question as much as practical.
One has a table with, say 1.000.000+ records of users and need to pull data for, say 50.000 of them from that table, using user_id only. How would you expect IN to behave? If not good, is it the only option or is there anything else one could try?
You could insert your search values into a single column temporary table and join on that. I have seen other databases do Bad Things when presented with very large in clauses.
The IN functionality has actually pretty poor performance, so this is something I would avoid. Most of the time you can get by by using a joined query, so depending on your database structure you should definitively favor a join over an IN-statement.
If IN starts to prove troublesome (as other answerers have suggested it might, you could try rewriting your query using EXISTS instead.
SELECT *
FROM MYTAB
WHERE MYKEY IN (SELECT KEYVAL
FROM MYOTHERTAB
WHERE some condition)
could become
SELECT *
FROM MYTAB
WHERE EXISTS (SELECT *
FROM MYOTHERTAB
WHERE some condition AND
MYTAB.MYKEY = MYOTHERTAB.KEYVAL)
I have often found this speeds things up quite a bit.
Use a JOIN to select the data you need.
As a more general case of this question because I think it may be of interest to more people...What's the best way to perform a fulltext search on two tables? Assume there are three tables, one for programs (with submitter_id) and one each for tags and descriptions with object_id: foreign keys referring to records in programs. We want the submitter_id of programs with certain text in their tags OR descriptions. We have to use MATCH AGAINST for reasons that I won't go into here. Don't get hung up on that aspect.
programs
id
submitter_id
tags_programs
object_id
text
descriptions_programs
object_id
text
The following works and executes in a 20ms or so:
SELECT p.submitter_id
FROM programs p
WHERE p.id IN
(SELECT t.object_id
FROM titles_programs t
WHERE MATCH (t.text) AGAINST ('china')
UNION ALL
SELECT d.object_id
FROM descriptions_programs d
WHERE MATCH (d.text) AGAINST ('china'))
but I tried to rewrite this as a JOIN as follows and it runs for a very long time. I have to kill it after 60 seconds.
SELECT p.id
FROM descriptions_programs d, tags_programs t, programs p
WHERE (d.object_id=p.id AND MATCH (d.text) AGAINST ('china'))
OR (t.object_id=p.id AND MATCH (t.text) AGAINST ('china'))
Just out of curiosity I replaced the OR with AND. That also runs in s few milliseconds, but it's not what I need. What's wrong with the above second query? I can live with the UNION and subselects, but I'd like to understand.
Join after the filters (e.g. join the results), don't try to join and then filter.
The reason is that you lose use of your fulltext index.
Clarification in response to the comment: I'm using the word join generically here, not as JOIN but as a synonym for merge or combine.
I'm essentially saying you should use the first (faster) query, or something like it. The reason it's faster is that each of the subqueries is sufficiently uncluttered that the db can use that table's full text index to do the select very quickly. Joining the two (presumably much smaller) result sets (with UNION) is also fast. This means the whole thing is fast.
The slow version winds up walking through lots of data testing it to see if it's what you want, rather than quickly winnowing the data down and only searching through rows you are likely to actually want.
Just in case you don't know: MySQL has a built in statement called EXPLAIN that can be used to see what's going on under the surface. There's a lot of articles about this, so I won't be going into any detail, but for each table it provides an estimate for the number of rows it will need to process. If you look at the "rows" column in the EXPLAIN result for the second query you'll probably see that the number of rows is quite large, and certainly a lot larger than from the first one.
The net is full of warnings about using subqueries in MySQL, but it turns out that many times the developer is smarter than the MySQL optimizer. Filtering results in some manner before joining can cause major performance boosts in many cases.
If you join both tables you end up having lots of records to inspect. Just as an example, if both tables have 100,000 records, fully joining them give you with 10,000,000,000 records (10 billion!).
If you change the OR by AND, then you allow the engine to filter out all records from table descriptions_programs which doesn't match 'china', and only then joining with titles_programs.
Anyway, that's not what you need, so I'd recommend sticking to the UNION way.
The union is the proper way to go. The join will pull in both full text indexes at once and can multiple the number of checks actually preformed.