Difference between these two joining table approaches? - mysql

Consider we have two tables, Users and Posts. user_id is the foreign key in Posts table and is primary key in Users table.
Whats the difference between these two sql queries?
select user.name, post.title
from users as user, posts as post
where post.user_id = user.user_id;
vs.
select user.name, post.title
from users as user join posts as post
using user_id;

Other than syntax, for the small snippet, they work exactly the same. But if at all possible, always write new queries using ANSI-JOINs.
As for semantically, the comma notation is used to produce a CARTESIAN product between two tables, which means produce a matrix of all records from table A with all records from table B, so two tables with 4 and 6 records respectively produces 24 records. Using the WHERE clause, you can then pick the rows you actually want from this cartesian product. However, MySQL doesn't actually follow through and make this huge matrix, but semantically this is what it means.
A JOIN syntax is the ANSI standard that more clearly defines how tables interact. By putting the ON clause next to the JOIN, it makes it clear what links the two tables together.
Functionally, they will perform the same for your two queries. The difference comes in when you start using other [OUTER] JOIN types.
For MySQL specifically, comma-notation does have one difference
STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer puts the tables in the wrong order.
However, it would not be wise to bank on this difference.

where post.user_id = user.user_id
Here you are making a conditional statement
from users as user join posts as post using user_id
Here you are joining two tables using the foreign key
At the end is just the same but JOIN is better used for more advanced queries...

In MySQL JOIN syntax, CROSS JOIN, INNER JOIN, and JOIN are all the same. A comma-separated table list is a JOIN.

The MySQL manual on page https://dev.mysql.com/doc/refman/5.5/en/join.html makes this point about the difference between the two approaches:
However, the precedence of the comma operator is less than that of
INNER JOIN, CROSS JOIN, LEFT JOIN, and so on. If you mix comma joins
with the other join types when there is a join condition, an error of
the form Unknown column 'col_name' in 'on clause' may occur.

Related

SQLJoin Results

Select * from a join b on a.id=b.id and a.vol<5
Select * from a join b on a.id=b.id where a.vol<5
Do they produce the same results?
If they don't produce the same results, a has 1000 rows, b jas 100 rows, how many rows will each produce?
I would say yes, it does.
A "Join" implies an "Inner Join" so it doesn't matter if you have an "and" in the join or a "Where" after the join.
It would be different if it was an "outer Join" Specifying a "Where" with an outer joined table will turn the join into an "Inner Join" or simply "Join"
Hope that made sense
For an INNER JOIN, like the simple query you have here, they are the same.
For an OUTER JOIN, they might not be the same.
For example, take these two queries:
select * from orders o left join orderlines ol on ol.order_id = o.id where o.id=12345
and
select * from orders o left join orderlines ol on ol.order_id = o.id and o.id=12345
The first query will give you data on order #12345 and it's lines, if any. The second query will give you data from all orders, but only order #12345 will have any item data.
This also illustrates how the two options have different semantic meanings. Even if they produce the same results, the two queries from your question have different semantic meanings, which might be important as an application grows over time.
I think you satisfied from answers but I want to mention about another side of this usage.
This two method generates the same result but compiler uses the different techniques to get the result.
Of course, different technique generates different results. But when ? It is very hard to illustrate the stiation but I will try to explain.
Think that we have two table but first table has isDeleted column for records. This application does not deletes the rows and get just updates the IsDeleted column and ignored that records.
In first case if you do not filter records in ON operator and you filtered it in where criteria. These records will be included in other joins and you will calculate the result wrong. Think that you joined this table Amounts table. The result is wrong because deleted records included and then you filtered them in where criteria.
This difference can lead to very big mistakes specially in queries which has many joins.
I wish I succeded the explanation. I m not good at. :)

SQL: INNER JOIN or WHERE? [duplicate]

For simplicity, assume all relevant fields are NOT NULL.
You can do:
SELECT
table1.this, table2.that, table2.somethingelse
FROM
table1, table2
WHERE
table1.foreignkey = table2.primarykey
AND (some other conditions)
Or else:
SELECT
table1.this, table2.that, table2.somethingelse
FROM
table1 INNER JOIN table2
ON table1.foreignkey = table2.primarykey
WHERE
(some other conditions)
Do these two work on the same way in MySQL?
INNER JOIN is ANSI syntax that you should use.
It is generally considered more readable, especially when you join lots of tables.
It can also be easily replaced with an OUTER JOIN whenever a need arises.
The WHERE syntax is more relational model oriented.
A result of two tables JOINed is a cartesian product of the tables to which a filter is applied which selects only those rows with joining columns matching.
It's easier to see this with the WHERE syntax.
As for your example, in MySQL (and in SQL generally) these two queries are synonyms.
Also, note that MySQL also has a STRAIGHT_JOIN clause.
Using this clause, you can control the JOIN order: which table is scanned in the outer loop and which one is in the inner loop.
You cannot control this in MySQL using WHERE syntax.
Others have pointed out that INNER JOIN helps human readability, and that's a top priority, I agree.
Let me try to explain why the join syntax is more readable.
A basic SELECT query is this:
SELECT stuff
FROM tables
WHERE conditions
The SELECT clause tells us what we're getting back; the FROM clause tells us where we're getting it from, and the WHERE clause tells us which ones we're getting.
JOIN is a statement about the tables, how they are bound together (conceptually, actually, into a single table).
Any query elements that control the tables - where we're getting stuff from - semantically belong to the FROM clause (and of course, that's where JOIN elements go). Putting joining-elements into the WHERE clause conflates the which and the where-from, that's why the JOIN syntax is preferred.
Applying conditional statements in ON / WHERE
Here I have explained the logical query processing steps.
Reference: Inside Microsoft® SQL Server™ 2005 T-SQL Querying
Publisher: Microsoft Press
Pub Date: March 07, 2006
Print ISBN-10: 0-7356-2313-9
Print ISBN-13: 978-0-7356-2313-2
Pages: 640
Inside Microsoft® SQL Server™ 2005 T-SQL Querying
(8) SELECT (9) DISTINCT (11) TOP <top_specification> <select_list>
(1) FROM <left_table>
(3) <join_type> JOIN <right_table>
(2) ON <join_condition>
(4) WHERE <where_condition>
(5) GROUP BY <group_by_list>
(6) WITH {CUBE | ROLLUP}
(7) HAVING <having_condition>
(10) ORDER BY <order_by_list>
The first noticeable aspect of SQL that is different than other programming languages is the order in which the code is processed. In most programming languages, the code is processed in the order in which it is written. In SQL, the first clause that is processed is the FROM clause, while the SELECT clause, which appears first, is processed almost last.
Each step generates a virtual table that is used as the input to the following step. These virtual tables are not available to the caller (client application or outer query). Only the table generated by the final step is returned to the caller. If a certain clause is not specified in a query, the corresponding step is simply skipped.
Brief Description of Logical Query Processing Phases
Don't worry too much if the description of the steps doesn't seem to make much sense for now. These are provided as a reference. Sections that come after the scenario example will cover the steps in much more detail.
FROM: A Cartesian product (cross join) is performed between the first two tables in the FROM clause, and as a result, virtual table VT1 is generated.
ON: The ON filter is applied to VT1. Only rows for which the <join_condition> is TRUE are inserted to VT2.
OUTER (join): If an OUTER JOIN is specified (as opposed to a CROSS JOIN or an INNER JOIN), rows from the preserved table or tables for which a match was not found are added to the rows from VT2 as outer rows, generating VT3. If more than two tables appear in the FROM clause, steps 1 through 3 are applied repeatedly between the result of the last join and the next table in the FROM clause until all tables are processed.
WHERE: The WHERE filter is applied to VT3. Only rows for which the <where_condition> is TRUE are inserted to VT4.
GROUP BY: The rows from VT4 are arranged in groups based on the column list specified in the GROUP BY clause. VT5 is generated.
CUBE | ROLLUP: Supergroups (groups of groups) are added to the rows from VT5, generating VT6.
HAVING: The HAVING filter is applied to VT6. Only groups for which the <having_condition> is TRUE are inserted to VT7.
SELECT: The SELECT list is processed, generating VT8.
DISTINCT: Duplicate rows are removed from VT8. VT9 is generated.
ORDER BY: The rows from VT9 are sorted according to the column list specified in the ORDER BY clause. A cursor is generated (VC10).
TOP: The specified number or percentage of rows is selected from the beginning of VC10. Table VT11 is generated and returned to the caller.
Therefore, (INNER JOIN) ON will filter the data (the data count of VT will be reduced here itself) before applying the WHERE clause. The subsequent join conditions will be executed with filtered data which improves performance. After that, only the WHERE condition will apply filter conditions.
(Applying conditional statements in ON / WHERE will not make much difference in few cases. This depends on how many tables you have joined and the number of rows available in each join tables)
The implicit join ANSI syntax is older, less obvious, and not recommended.
In addition, the relational algebra allows interchangeability of the predicates in the WHERE clause and the INNER JOIN, so even INNER JOIN queries with WHERE clauses can have the predicates rearranged by the optimizer.
I recommend you write the queries in the most readable way possible.
Sometimes this includes making the INNER JOIN relatively "incomplete" and putting some of the criteria in the WHERE simply to make the lists of filtering criteria more easily maintainable.
For example, instead of:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
AND c.State = 'NY'
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
AND a.Status = 1
Write:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
WHERE c.State = 'NY'
AND a.Status = 1
But it depends, of course.
Implicit joins (which is what your first query is known as) become much much more confusing, hard to read, and hard to maintain once you need to start adding more tables to your query. Imagine doing that same query and type of join on four or five different tables ... it's a nightmare.
Using an explicit join (your second example) is much more readable and easy to maintain.
I'll also point out that using the older syntax is more subject to error. If you use inner joins without an ON clause, you will get a syntax error. If you use the older syntax and forget one of the join conditions in the where clause, you will get a cross join. The developers often fix this by adding the distinct keyword (rather than fixing the join because they still don't realize the join itself is broken) which may appear to cure the problem but will slow down the query considerably.
Additionally for maintenance if you have a cross join in the old syntax, how will the maintainer know if you meant to have one (there are situations where cross joins are needed) or if it was an accident that should be fixed?
Let me point you to this question to see why the implicit syntax is bad if you use left joins.
Sybase *= to Ansi Standard with 2 different outer tables for same inner table
Plus (personal rant here), the standard using the explicit joins is over 20 years old, which means implicit join syntax has been outdated for those 20 years. Would you write application code using a syntax that has been outdated for 20 years? Why do you want to write database code that is?
The SQL:2003 standard changed some precedence rules so a JOIN statement takes precedence over a "comma" join. This can actually change the results of your query depending on how it is setup. This cause some problems for some people when MySQL 5.0.12 switched to adhering to the standard.
So in your example, your queries would work the same. But if you added a third table:
SELECT ... FROM table1, table2 JOIN table3 ON ... WHERE ...
Prior to MySQL 5.0.12, table1 and table2 would be joined first, then table3. Now (5.0.12 and on), table2 and table3 are joined first, then table1. It doesn't always change the results, but it can and you may not even realize it.
I never use the "comma" syntax anymore, opting for your second example. It's a lot more readable anyway, the JOIN conditions are with the JOINs, not separated into a separate query section.
They have a different human-readable meaning.
However, depending on the query optimizer, they may have the same meaning to the machine.
You should always code to be readable.
That is to say, if this is a built-in relationship, use the explicit join. if you are matching on weakly related data, use the where clause.
I know you're talking about MySQL, but anyway:
In Oracle 9 explicit joins and implicit joins would generate different execution plans. AFAIK that has been solved in Oracle 10+: there's no such difference anymore.
If you are often programming dynamic stored procedures, you will fall in love with your second example (using where). If you have various input parameters and lots of morph mess, then that is the only way. Otherwise, they both will run the same query plan so there is definitely no obvious difference in classic queries.
ANSI join syntax is definitely more portable.
I'm going through an upgrade of Microsoft SQL Server, and I would also mention that the =* and *= syntax for outer joins in SQL Server is not supported (without compatibility mode) for 2005 SQL server and later.
I have two points for the implicit join (The second example):
Tell the database what you want, not what it should do.
You can write all tables in a clear list that is not cluttered by join conditions. Then you can much easier read what tables are all mentioned. The conditions come all in the WHERE part, where they are also all lined up one below the other. Using the JOIN keyword mixes up tables and conditions.

How to optimize this complex query?

How i can optimize this query? for now it's executing in 0.0100 second.
SELECT comments.comment_content, comments.comment_votes, comments.comment_date,
users.user_login, users.user_level, users.user_avatar_source,
groups.group_safename
FROM comments
LEFT JOIN links ON comment_link_id=link_id
LEFT JOIN users ON comment_user_id=user_id
LEFT JOIN groups ON comment_group_id=link_group_id
WHERE comment_status='published' AND link_status='published'
ORDER BY comment_id DESC
EXPLAIN output:
Indexes:
Comment:
Users:
Groups:
Sub-twenty-millisecond query times aren't usually considered to be slow. As some folks have mentioned in the comments, it will be necessary for you to redo your optimization when your tables get larger, because MySQL's optimizer (and optimizers for other RDMSs) makes decisions based on index size.
I recommend you always qualify your column names in JOIN clauses with table names or aliases. For example, you will gain clarity and maintainability by using a style like this:
FROM comments AS c
LEFT JOIN links AS L ON c.comment_link_id=L.link_id
LEFT JOIN users AS u ON c.comment_user_id=u.user_id
LEFT JOIN groups AS g ON c.comment_group_id=g.link_group_id
This query selects a fairly broad subset of your tables, so it will run slower the larger your tables are. That's inevitable unless you can narrow the subset somehow.
Are the columns you're using for JOIN ... ON operations all declared NOT NULL? They should be.
Looking at how you are using the groups table: You're joining on link_group_id and retrieving group_safename. So, try a compound covering index on (link_group_id,group_safename). At a minimum, index link_group_id.
The users table: You've already got an index on user_id. When your tables get bigger a compound covering index on (user_id, user_login, user_level, user_avatar_source) may help. But that's a low-priority thing to try.
The links table: You're using link_status and link_id. Your LEFT JOIN for this table should be a plain inner JOIN because one of its columns shows up in your WHERE clause. If link_status can be NOT NULL in your application make sure it is declared that way. Then try a compound index on (link_status, link_id).
The comments table: You have no index on comment_status as far as I can see. Try adding one.
Then put a bunch of data in your tables, run OPTIMIZE LOCAL TABLE for each table, then try your query with EXPLAIN again.

MySQL using select with 2 queries, subquery or join?

Related to my last question (MySQLi performance, multiple (separate) queries vs subqueries) I came across another question.
Sometimes I'm using a subquery to select the value from another table (eg. the username connected to an ID), but I'm not sure about the select-in-select, because it doesn't seem to be very clean and I'm not sure about the performance.
The subquery could look like this:
SELECT
(SELECT `user_name` FROM `users`
WHERE `user_id` = table2.user_id) AS `user_name`
, `value1`
, `value2`
FROM
`table2`
....
Would it be "better" to use a separate query for the result from table1 and another for table2 (doubles the connections, but no need to cross tables), or should I even use a JOIN to get the results in a single query?
I don't have much experience with JOINS and subqueries yet, so I'm not sure if a JOIN would be "too much" in this case, because I really just need one name connected to an ID (or maybe count the number of rows from a table), or if it doesn't matter, because the select-in-select is treated like some kind of JOIN, too..
Solution with JOIN could look like this:
SELECT
users.user_name , table2.value1, table2.value2
FROM
`table2`
INNER JOIN
`users`
ON
users.user_id = table2.user_id
....
And if I should prefer JOIN, which one would be best in this case: left join, inner join or something else?
The very fact that you are asking whether to use inner join or left join indeed shows that you haven't done much work with them.
The purposes of these two are entirely different, inner join is used to return columns from two or more tables where some columns have matching values. left join is used when you want the rows from the table specified left in the join clause to return even when there is no matching column in the other tables. It depends on your application. If one table has names of players, and another table contains details of penalties paid by them, then you will most certainly want to use left join, to account for players without a penalty, and thus without a record in the 2nd table.
Regarding whether to use subquery or join, joins can be much faster when properly used. By properly I mean, when there are indices on the join columns, the tables are specified in increasing order of the number of containing rows (generally. There might be exceptions), the join columns have similar data-types, etc. If all these conditions match, join would be the better option.

Difference between SQL JOIN and querying from two tables

What is the difference between the query
SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo
FROM Persons
INNER JOIN Orders
ON Persons.P_Id=Orders.P_Id
ORDER BY Persons.LastName
and this one
SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo
FROM Persons, Orders
WHERE Persons.P_Id=Orders.P_Id
ORDER BY Persons.LastName
There is a small difference in syntax, but both queries are doing a join on the P_Id fields of the respective tables.
In your second example, this is an implicit join, which you are constraining in your WHERE clause to the P_Id fields of both tables.
The join is explicit in your first example and the join clause contains the constraint instead of in an additional WHERE clause.
They are basically equivalent. In general, the JOIN keywords enables you to be more explicit about direction (LEFT, RIGHT) and type (INNER, OUTER, CROSS) of your join.
This SO posting has a good explanation of the differences in ANSI SQL complaince, and bears similarities to the question asked here.
While (as it has been stated) both queries will produce the same result, I find that it is always a good idea to explicitly state your JOINs. It's much easier to understand, especially when there are non-JOIN-related evaluations in the WHERE clause.
Explicitly stating your JOIN also prevents you from inadvertently querying a Cartesian product. In your 2nd query above, if you (for whatever reason) forgot to include your WHERE clause, your query would run without JOIN conditions and return a result set of every row in Persons matched with every row in Orders...probably not something that you want.
The difference is in syntax, but not in the semantics.
The explicit JOIN syntax:
is considered more readable and
allows you to cleanly and in standard way specify whether you want INNER, LEFT/RIGHT OUTER or a CROSS join. This is in contrast to using DBMS-specific syntax, such as old Oracle's Persons.P_Id = Orders.P_Id(+) syntax for left outer join, for example.