SQLJoin Results - mysql

Select * from a join b on a.id=b.id and a.vol<5
Select * from a join b on a.id=b.id where a.vol<5
Do they produce the same results?
If they don't produce the same results, a has 1000 rows, b jas 100 rows, how many rows will each produce?

I would say yes, it does.
A "Join" implies an "Inner Join" so it doesn't matter if you have an "and" in the join or a "Where" after the join.
It would be different if it was an "outer Join" Specifying a "Where" with an outer joined table will turn the join into an "Inner Join" or simply "Join"
Hope that made sense

For an INNER JOIN, like the simple query you have here, they are the same.
For an OUTER JOIN, they might not be the same.
For example, take these two queries:
select * from orders o left join orderlines ol on ol.order_id = o.id where o.id=12345
and
select * from orders o left join orderlines ol on ol.order_id = o.id and o.id=12345
The first query will give you data on order #12345 and it's lines, if any. The second query will give you data from all orders, but only order #12345 will have any item data.
This also illustrates how the two options have different semantic meanings. Even if they produce the same results, the two queries from your question have different semantic meanings, which might be important as an application grows over time.

I think you satisfied from answers but I want to mention about another side of this usage.
This two method generates the same result but compiler uses the different techniques to get the result.
Of course, different technique generates different results. But when ? It is very hard to illustrate the stiation but I will try to explain.
Think that we have two table but first table has isDeleted column for records. This application does not deletes the rows and get just updates the IsDeleted column and ignored that records.
In first case if you do not filter records in ON operator and you filtered it in where criteria. These records will be included in other joins and you will calculate the result wrong. Think that you joined this table Amounts table. The result is wrong because deleted records included and then you filtered them in where criteria.
This difference can lead to very big mistakes specially in queries which has many joins.
I wish I succeded the explanation. I m not good at. :)

Related

INNER JOIN condition in WHERE clause or ON clause?

I mistyped a query today, but it still worked and gave the intended result. I meant to run this query:
SELECT e.id FROM employees e JOIN users u ON u.email=e.email WHERE u.id='139840'
but I accidentally ran this query
SELECT e.id FROM employees e JOIN users u ON u.email=e.email AND u.id='139840'
(note the AND instead of WHERE in the last clause)
and both returned the correct employee id from the user id.
What is the difference between these 2 queries? Does the second form only join members of the 2 tables meeting the criteria, whereas the first one would join the entire table, and then run the query? Is one more or less efficient than the other? Is it something else I am missing?
Thanks!
For inner joins like this they are logically equivalent. However, you can run in to situations where a condition in the join clause means something different than a condition in the where clause.
As a simple illustration, imagine you do a left join like so;
select x.id
from x
left join y
on x.id = y.id
;
Here we're taking all the rows from x, regardless of whether there is a matching id in y. Now let's say our join condition grows - we're not just looking for matches in y based on the id but also on id_type.
select x.id
from x
left join y
on x.id = y.id
and y.id_type = 'some type'
;
Again this gives all the rows in x regardless of whether there is a matching (id, id_type) in y.
This is very different, though:
select x.id
from x
left join y
on x.id = y.id
where y.id_type = 'some type'
;
In this situation, we're picking all the rows of x and trying to match to rows from y. Now for rows for which there is no match in y, y.id_type will be null. Because of that, y.id_type = 'some type' isn't satisfied, so those rows where there is no match are discarded, which effectively turned this in to an inner join.
Long story short: for inner joins it doesn't matter where the conditions go but for outer joins it can.
In the case of an INNER JOIN, the two queries are semantically the same, meaning they are guaranteed to have the same results. If you were using an OUTER join, the meaning of the two queries could be very different, with different results.
Performance-wise, I would expect that these two queries would result in the same execution plan. However, the query engine might surprise you. The only way to know is to view the execution plans for the two queries.
The optimizer will treat them the same. You can do an EXPLAIN to prove it to yourself.
Therefore, write the one that is clearer.
SELECT e.id
FROM employees e JOIN users u ON u.email=e.email
WHERE u.id='139840'
If it were an outer join instead of inner, you'd get unintended results, but when using an inner join it makes no real difference whether you use additional join criteria instead of a WHERE clause.
Performance-wise they are most likely identical, but can't be certain.
I brought this up with my colleagues on our team at work. This response is a bit SQL Server centered and not MySQL. However, the optimizer should have similarities in operation between SQL and MySQL..
Some thoughts:
Essentially, if you have to add a WHERE, there are additional table scans done to verify equality for each condition (This goes up by orders of magnitude with an AND or dataset, an OR, the decision is cast at the first true condition) – if you have one id pointer in the example given it is extremely quick conversely, if you have to find all of the records that belong to a company or department it becomes more obscure as you may have multiples of records. If you can apply the equals condition, it is far more effective when working with an AuditLog or EventLog table that has zillions of rows. One would not really see the large benefits of this on small tables (at around 200,000 rows or so).
From: Allesandro Alpi
http://suxstellino.wordpress.com/2013/01/07/sql-server-logical-query-processing-summary/
From: Itzik Ben-Gan
http://tsql.solidq.com/books/insidetsql2008/Logical%20Query%20Processing%20Poster.pdf

Order of Joins in MySQL

Is Order of Joins important if there are
multiple joins
3rd join depends on 2nd join (lets assume and is the case in this question)
I am unable to come to conclusion on this. I had multiple queries with the above criteria. Some of them seem to work, some are not producing proper result (not sure if its because of joins), some actually throw error.
Anyone has any specific Idea on this?
Order of joins is important if you are using OUTER joins in your query (LEFT OUTER JOIN, RIGHT OUTER JOIN, LEFT JOIN, or RIGHT JOIN notation).
If you're only using all INNER joins, it should not matter as long as they all relate to each other via some chain of ON conditions. This hold true whether you have 3 or 30 inner joins linked together.
The query optimizer will juggle them around anyhow based on the optimal execution plan based on indexes and such.

MySQL using select with 2 queries, subquery or join?

Related to my last question (MySQLi performance, multiple (separate) queries vs subqueries) I came across another question.
Sometimes I'm using a subquery to select the value from another table (eg. the username connected to an ID), but I'm not sure about the select-in-select, because it doesn't seem to be very clean and I'm not sure about the performance.
The subquery could look like this:
SELECT
(SELECT `user_name` FROM `users`
WHERE `user_id` = table2.user_id) AS `user_name`
, `value1`
, `value2`
FROM
`table2`
....
Would it be "better" to use a separate query for the result from table1 and another for table2 (doubles the connections, but no need to cross tables), or should I even use a JOIN to get the results in a single query?
I don't have much experience with JOINS and subqueries yet, so I'm not sure if a JOIN would be "too much" in this case, because I really just need one name connected to an ID (or maybe count the number of rows from a table), or if it doesn't matter, because the select-in-select is treated like some kind of JOIN, too..
Solution with JOIN could look like this:
SELECT
users.user_name , table2.value1, table2.value2
FROM
`table2`
INNER JOIN
`users`
ON
users.user_id = table2.user_id
....
And if I should prefer JOIN, which one would be best in this case: left join, inner join or something else?
The very fact that you are asking whether to use inner join or left join indeed shows that you haven't done much work with them.
The purposes of these two are entirely different, inner join is used to return columns from two or more tables where some columns have matching values. left join is used when you want the rows from the table specified left in the join clause to return even when there is no matching column in the other tables. It depends on your application. If one table has names of players, and another table contains details of penalties paid by them, then you will most certainly want to use left join, to account for players without a penalty, and thus without a record in the 2nd table.
Regarding whether to use subquery or join, joins can be much faster when properly used. By properly I mean, when there are indices on the join columns, the tables are specified in increasing order of the number of containing rows (generally. There might be exceptions), the join columns have similar data-types, etc. If all these conditions match, join would be the better option.

Querying to find if some columns are in array

I have a complex nested-query which is inside a join, is it possible to find several columns that match that query instead of repeating the query in the Join? ie:
select * from
A left join B on a.xid=b.xid and
(a.userid or b.userid) in (select userid from A where..)
^^^ don't want to duplicate the nested-query...
There is a nested query that should match several columns from the parent-query (as seen in the example above). The simple way is to duplicate the nested query several times. ie-
select * from A
left join B
on a.xid=b.xid
and a.userid in (select userid from ...)
and b.userid in (Select userid from ....)
BUT - since the subquery is bit complicated I don't want mysql to run it twice, but rather only once and than match it against several of the parent query columns.
If your subquery is working properly and you have the query cache turned on you won't have to worry about performance. If its a question of it being overly complex then maybe you could use a proc for this query: put the results of the sub into a temp table and join to it.
There are lots of ways to approach this.

Difference between SQL JOIN and querying from two tables

What is the difference between the query
SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo
FROM Persons
INNER JOIN Orders
ON Persons.P_Id=Orders.P_Id
ORDER BY Persons.LastName
and this one
SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo
FROM Persons, Orders
WHERE Persons.P_Id=Orders.P_Id
ORDER BY Persons.LastName
There is a small difference in syntax, but both queries are doing a join on the P_Id fields of the respective tables.
In your second example, this is an implicit join, which you are constraining in your WHERE clause to the P_Id fields of both tables.
The join is explicit in your first example and the join clause contains the constraint instead of in an additional WHERE clause.
They are basically equivalent. In general, the JOIN keywords enables you to be more explicit about direction (LEFT, RIGHT) and type (INNER, OUTER, CROSS) of your join.
This SO posting has a good explanation of the differences in ANSI SQL complaince, and bears similarities to the question asked here.
While (as it has been stated) both queries will produce the same result, I find that it is always a good idea to explicitly state your JOINs. It's much easier to understand, especially when there are non-JOIN-related evaluations in the WHERE clause.
Explicitly stating your JOIN also prevents you from inadvertently querying a Cartesian product. In your 2nd query above, if you (for whatever reason) forgot to include your WHERE clause, your query would run without JOIN conditions and return a result set of every row in Persons matched with every row in Orders...probably not something that you want.
The difference is in syntax, but not in the semantics.
The explicit JOIN syntax:
is considered more readable and
allows you to cleanly and in standard way specify whether you want INNER, LEFT/RIGHT OUTER or a CROSS join. This is in contrast to using DBMS-specific syntax, such as old Oracle's Persons.P_Id = Orders.P_Id(+) syntax for left outer join, for example.