Cascaded LEFT JOIN v.s. sub-select - mysql

I have 2 queries:
1)
SELECT person.idx, person.name, groups.idx
FROM people
LEFT JOIN membership
ON membership.person=person.idx
LEFT JOIN groups
ON groups.name='supervisors'
AND membership.group=groups.idx
2)
SELECT person.idx, person.name, a.idx
FROM people
LEFT JOIN
(SELECT group.idx, membership.person
FROM groups, membership
WHERE membership.group=group.idx
AND group.name='supervisors') a
ON a.person=person.idx
These queries have been simplified but the core logic is the same. They seem to be equivalent. The 1st seems "cleaner" syntactically. I'm not an SQL expert, and am pretty new to LEFT JOIN in particular, but it seems to be the way to answer this kind of membership question, where one table contains a subset of information about another table. Is this the right approach?

The two queries are not the same. The first returns rows for all groups a person is a member of, with the idx of the "supervisors" where appropriate.
The second returns one row for each member, with the idx of the "supervisors" group where appropriate. You should choose the version that does what you want.
Once you have the logic that you want, then in MySQL, it is usually best to avoid subqueries in the FROM clause if possible. MySQL has a tendency to materialize them, which makes optimizations more difficult (I think this has gotten better in more recent versions).
Also, you should eschew commas in the FROM clause and always use proper, explicit, standard JOIN syntax.

Poor form to answer my own question?
This seems to work:
SELECT person.idx, person.name, groups.idx
FROM people
LEFT JOIN (groups, membership)
ON groups.name='supervisors'
AND groups.person=person.idx
AND membership.group=groups.idx
Maybe a MySQL only extension?

Related

Can the usage of inner join be avoided in MySQL?

After reading the question title you may find it silly but I'm seriously asking this question with curiosity in my mind.
I'm using MySQL database system.
Consider below the two tables :
Customers(CustomerID(Primary Key), CustomerName, ContactName, Address, City, PostalCode, Country)
Orders(OrderID(Primary Key), CustomerID(Foreign Key), EmployeeID, OrderDate, ShipperID)
Now I want to get the details of all orders that is which order is placed by which customer?
So, I did it in two ways :
First way:
SELECT o.OrderID, o.OrderDate, c.CustomerName
FROM Customers AS c, Orders AS o
WHERE c.CustomerID=o.CustomerID;
Second way:
SELECT Orders.OrderID, Orders.OrderDate, Customers.CustomerName
FROM Orders
INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID;
In both the cases I'm getting exactly the same correct result. My question is why there is a necessary of additional and confusing concept of Inner Join in MySQL as we can achieve the same results even without using Inner Join?
Is the Inner Join more effective in any manner?
What you are looking at is ANSI-89 syntax (A,B WHERE) vs ANSI-92 syntax (A JOIN B ON).
For very simple queries, there is no difference. However, there are a number of things you can do with ANSI-92 that you cannot do or that become very difficult to implement and maintain in ANSI-89. Anything more than two tables involved, more than one condition in the same join, or separating LEFT JOIN conditions from WHERE conditions are all much harder to read and work with in the older syntax.
The old A,B WHERE syntax is generally considered obsolete and avoided, even for the simple queries where it still works.
The trade-offs of hardware optimization are second to none to users being able to maintain their queries.
Having explicit clean code is better than having esoteric implicit code. In actual production relational databases, most of the queries that take too long come from the ones where the tables are in a concatenated list. These queries show that:
User did not put the effort on expressing the order these tables are joined.
All the relationship joins are cluttered in one place instead organized on its own space for each join.
If all queries are in such format for said user, user does not take
advantage of Outer Joins. There are many cases where a relationship between tables can be: (1) TO (0-many) OR (many) TO (many) instead of (1) TO (1-many).
As in most use cases, these queries become to start to be a problem when the number of joins increase. Beginner users choose to query the tables by placing them as a list delimited with a comma because it takes less to type. At first, it does not seem to be a problem because they are joined against two to three tables. This in turn become a habit to the beginner user. As they start to write more complicated queries by increasing their number of joins, those type of queries are harder to maintain as described from the above bullet points.
Conclusion: As the number of joins within a query scales, improper indentation and categorization make the query harder to maintain.
You should use INNER JOIN and ident your query as below so it is easy for others to read:
SELECT
Orders.OrderID,
Orders.OrderDate,
Customers.CustomerName
FROM Orders
INNER JOIN Customers
ON Customers.CustomerID = Orders.CustomerID;

MySQL Left Join and Right Join Optimization

I've been looking up some documentation about this topic here: https://dev.mysql.com/doc/refman/5.7/en/left-join-optimization.html
But I don't understand the following example:
The join optimizer calculates the order in which to join tables. The table read order forced by LEFT JOIN or STRAIGHT_JOIN helps the join optimizer do its work much more quickly, because there are fewer table permutations to check. This means that if you execute a query of the following type, MySQL does a full scan on b because the LEFT JOIN forces it to be read before d:
SELECT *
FROM a JOIN b LEFT JOIN c ON (c.key=a.key)
LEFT JOIN d ON (d.key=a.key)
WHERE b.key=d.key;
The fix in this case is reverse the order in which a and b are listed in the FROM clause:
SELECT *
FROM b JOIN a LEFT JOIN c ON (c.key=a.key)
LEFT JOIN d ON (d.key=a.key)
WHERE b.key=d.key;
Why does the order make an optimization? Do JOIN and LEFT_JOIN execute in some order?
I suspect the first quote is not quite correct. I have seen LEFT JOIN turned into JOIN and then the tables touched in the 'wrong' order.
Anyway, don't worry about the work the optimizer needs to do. In thousands of slow JOINs, I have identified only one case where the cost of picking the order was important. And it was a case of multiple joins to a single table; yet another drawback of EAV schema. Anyway, there is a simple setting to avoid that problem.
LEFT/RIGHT/plain JOINs are semantically done left-to-right (regardless of the order the optimizer chooses to touch the tables).
If you are concerned about the ordering, you can add parentheses. For example:
FROM (a JOIN b ON ...) JOIN (c JOIN d ON ...) ON ...
If you are using "commajoin" (FROM a,b...), don't. However, its precedence changed long ago. The workaround was to add parens so that the same SQL would work in versions before and after the change.
Don't use LEFT unless you need it to get NULLs for missing 'right' rows. It just confuses readers into thinking that you expect NULLs.
This example is wrong in many ways, and it is not clear to me what it is trying to convey. Apologies for that. I will file a bug with the documentation team.
Some clarifications:
For the given query, the last LEFT JOIN will be converted to an inner join. This is because the WHERE clause, WHERE b.key=d.key, implies that d.key can not be NULL. Hence, any extra rows produced by LEFT JOIN compared to INNER JOIN would be filtered out by the WHERE clause. (The principles of this transformation is described in the paragraph following the given example.)
The ON clause of the first LEFT JOIN, ON (c.key=a.key), makes table c dependent on table a, but not table b. Hence, the only the requirement wrt join order is that table a is processed before table c. The order in which tables a and b are listed in the query, will not change that.
Tables b and d may be processed in any order, both wrt each other and wrt the other tables of the query.
This paragraph seems to recommend LEFT JOIN as a mechanism to reduce number of "table permutations to check". This is not meaningful since changing from INNER JOIN to LEFT JOIN may change the semantics of the query. For this purpose STRAIGHT_JOIN should be used instead.
For most join queries, execution time by far exceeds optimization time. Reducing the number of "table permutations to check" may cause potentially more efficient permutations to not be explored. Hence, LEFT JOIN should not be used unless it is required to get the wanted semantics.

Conditionals in WHEREs or JOINs?

Lets say I have the following query:
SELECT occurs.*, events.*
FROM occurs
INNER JOIN events ON (events.event_id = occurs.event_id)
WHERE event.event_state = 'visible'
Another way to do the same query and get the same results would be:
SELECT occurs.*, events.*
FROM occurs
INNER JOIN events ON (events.event_id = occurs.event_id
AND event.event_state = 'visible')
My question. Is there a real difference? Is one way faster than the other? Why would I choose one way over the other?
For an INNER JOIN, there's no conceptual difference between putting a condition in ON and in WHERE. It's a common practice to use ON for conditions that connect a key in one table to a foreign key in another table, such as your event_id, so that other people maintaining your code can see how the tables relate.
If you suspect that your database engine is mis-optimizing a query plan, you can try it both ways. Make sure to time the query several times to isolate the effect of caching, and make sure to run ANALYZE TABLE occurs and ANALYZE TABLE events to provide more info to the optimizer about the distribution of keys. If you do find a difference, have the database engine EXPLAIN the query plans it generates. If there's a gross mis-optimization, you can create an Oracle account and file a feature request against MySQL to optimize a particular query better.
But for a LEFT JOIN, there's a big difference. A LEFT JOIN is often used to add details from a separate table if the details exist or return the rows without details if they do not. This query will return result rows with NULL values for b.* if no row of b matches both conditions:
SELECT a.*, b.*
FROM a
LEFT JOIN b
ON (condition_one
AND condition_two)
WHERE condition_three
Whereas this one will completely omit results that do not match condition_two:
SELECT a.*, b.*
FROM a
LEFT JOIN b ON some_condition
WHERE condition_two
AND condition_three
Code in this answer is dual licensed: CC BY-SA 3.0 or the MIT License as published by OSI.

Difference between FROM and JOIN tables

I'm working through the JOIN tutorial on SQL zoo.
Let's say I'm about to execute the code below:
SELECT a.stadium, COUNT(g.matchid)
FROM game a
JOIN goal g
ON g.matchid = a.id
GROUP BY a.stadium
As it happens, it produces the same output as the code below:
SELECT a.stadium, COUNT(g.matchid)
FROM goal g
JOIN game a
ON g.matchid = a.id
GROUP BY a.stadium
So then, when does it matter which table you assign at FROM and which one you assign at JOIN?
When you are using an INNER JOIN like you are here, the order doesn't matter. That is because you are connecting two tables on a common index, so the order in which you use them is up to you. You should pick an order that is most logical to you, and easiest to read. A habit of mine is to put the table I'm selecting from first. In your case, you're selecting information about a stadium, which comes from the game table, so my preference would be to put that first.
In other joins, however, such as LEFT OUTER JOIN and RIGHT OUTER JOIN the order will matter. That is because these joins will select all rows from one table. Consider for example I have a table for Students and a table for Projects. They can exist independently, some students may have an associated project, but not all will.
If I want to get all students and project information while still seeing students without projects, I need a LEFT JOIN:
SELECT s.name, p.project
FROM student s
LEFT JOIN project p ON p.student_id = s.id;
Note here, that the LEFT JOIN refers to the table in the FROM clause, so that means ALL of students were being selected. This also means that p.project will be null for some rows. Order matters here.
If I took the same concept with a RIGHT JOIN, it will select all rows from the table in the join clause. So if I changed the query to this:
SELECT s.name, p.project
FROM student s
RIGHT JOIN project p ON p.student_id = s.id;
This will return all rows from the project table, regardless of whether or not it has a match for students. This means that in some rows, s.name will be null. Similar to the first example, because I've made project the outer joined table, p.project will never be null (assuming it isn't in the original table). In the first example, s.name should never be null.
In the case of outer joins, order will matter. Thankfully, you can think intuitively with LEFT and RIGHT joins. A left join will return all rows in the table to the left of that statement, while a right join returns all rows from the right of that statement. Take this as a rule of thumb, but be careful. You might want to develop a pattern to be consistent with yourself, as I mentioned earlier, so these queries are easier for you to understand later on.
When you only JOIN 2 tables, usually the order does not matter: MySQL scans the tables in the optimal order.
When you scan more than 2 tables, the order could matter:
SELECT ...
FROM a
JOIN b ON ...
JOIN c ON ...
Also, MySQL tries to scan the tables in the fastest way (large tables first). But if a join is slow, it is possible that MySQL is scanning them in a non-optimal order. You can verify this with EXPLAIN. In this case, you can force the join order by adding the STRAIGHT_JOIN keyword.
The order doesn't always matter, I usually just order it in a way that makes sense to someone reading your query.
Sometime order does matter. Try it with LEFT JOIN and RIGHT JOIN.
In this instance you are using an INNER JOIN, if you're expecting a match on a common ID or foreign key, it probably doesn't matter too much.
You would however need to specify the tables the correct way round if you were performing an OUTER JOIN, as not all records in this type of join are guaranteed to match via the same field.
yes, it will matter when you will user another join LEFT JOIN, RIGHT JOIN
currently You are using NATURAL JOIN that is return all tables related data, if JOIN table row not match then it will exclude row from result
If you use LEFT / RIGHT {OUTER} join then result will be different, follow this link for more detail

Join of tables returning incorrect results

There are 4 sql tables:
Listings(Amount, GroupKey, Key, MemberKey),
Loans(Amount, GroupKey, Key, ListingKey),
Members(City, GroupKey, Key)
Groups(GroupRank, Key, MemberKey)
Now, if one wants to find out the loans which are also listings and find the members city and GroupRank for the members in the loan table. Here, the group table contains information about grous of which members are a part of.
and also perform a select operation as given below:
select Listings.Amount, Members.City, Groups.GroupRank
from listings, loans, members, groups
where Listings.Key=Loans.ListingKey and
Members.Key=Listings.MemberKey and
Listings.GroupKey=Groups.Key
The above join is giving an incorrect result, please point out where I am going wrong.
Also I am new to SQL so please excuse the novice question.
Note: The following is just a guess what your problem is. Like others said, clearify your question.
You want to JOIN
( http://dev.mysql.com/doc/refman/5.1/de/join.html )
those tables. What you write is just another form of a join, meaning it has the same effect. But you "joined" a bit too much. To make things clearer a syntax has been invented to make things clearer and avoid such mistakes. Read more about it in the link given above.
What you want to achieve can be done like this:
SELECT
Listings.Amount, Members.City, Groups.GroupRank
FROM
Listings
INNER JOIN Groups ON Listings.GroupKey=Groups.Key
INNER JOIN Members ON Members.Key=Listings.MemberKey
You don't do a SELECT on the Loans table, you don't need it in this query.
This is the INNER JOIN which will give you a result where every row in table A has an according entry in table B. When this is not the case, you have to use the LEFT or RIGHT JOIN.
Maybe the problem is related to the join type (INNER). Try LEFT JOIN for example but Mark has right: you should clearify your question.
I would firstly change your query to use the more modern join syntax, which allows outer joins. Tr this:
select Listings.Amount, Members.City, Groups.GroupRank
from listings
left join loans on Listings.Key=Loans.ListingKey
left join members on Members.Key=Listings.MemberKey
left join groups on Listings.GroupKey=Groups.Key
and/or Loans.GroupKey=Groups.Key
and/or Members.Key=Groups.MemberKey
You may need to play with the criteria on the last join (maybe they should be "or" not "and" etc).