SQL "IN" combined with "=" in WHERE clause - mysql

I'm struggling with someone else's code. What might the WHERE clause do in the following (MySQL) statement?
SELECT * FROM t1, t2 WHERE t1.id = t2.id IN (1,2,3)
It's not providing the desired result in my case, but I'm trying to figure what the original author intended.
Can anyone provide an example of the use of a WHERE clause like this?

This condition starts from the right, evaluates t2.id IN (1,2,3), gets the result (0 or 1), and uses it for join with t1.id. All rows of t2 with id from the IN list are joined to the row in t1 that has id of one; all other rows of t2 are joined with the row in t1 that has id of zero. Here is a small demo on sqlfiddle.com: link.
It is hard to imagine that that was the intent of the author, however: I think a more likely check was for both items to be in the list, and also being equal to each other. The equality to each other is important, because it looks like the author wanted to join the two tables.
A more modern way of doing joins is with ANSI SQL syntax. Here is the equivalent of your query in ANSI SQL:
SELECT * FROM t1 JOIN t2 ON t1.id = t2.id IN (1,2,3)

Related

Full Outer Join get repeated with Union?

I'm trying to accomplish a Full Outer Join with my SQL.
Reference Link
FULL (OUTER) JOIN: Return all records when there is a match in
either left or right table
Although apparently this is not supported. I've looked around and have come across this accepted answer: https://stackoverflow.com/a/4796911/3859456
SELECT * FROM t1
LEFT JOIN t2 ON t1.id = t2.id
UNION
SELECT * FROM t1
RIGHT JOIN t2 ON t1.id = t2.id
Although won't this at least repeat the matched records twice when we do a Union? If not does a union automatically overwrite the matched records to the 2 tables?
E.g.
LEFT (OUTER) JOIN: Return all records from the left table, and the
matched records from the right table
RIGHT (OUTER) JOIN: Return all
records from the right table, and the matched records from the left
table
Union Left-Outer-Table + (left-matched = right-matched)x2 + Right-Outer-Table
I'm sure the answer works as the community trust it. But I'm still confused as to how it works and hope that someone can help me understand better.
To reiterate from the accepted answer to which you refer, I will quote both the UNION and UNION ALL versions:
SELECT * FROM t1
LEFT JOIN t2 ON t1.id = t2.id
UNION
SELECT * FROM t1
RIGHT JOIN t2 ON t1.id = t2.id
and
SELECT * FROM t1
LEFT JOIN t2 ON t1.id = t2.id
UNION ALL
SELECT * FROM t1
RIGHT JOIN t2 ON t1.id = t2.id
WHERE t1.id IS NULL
If there were no duplicates generated by the join, then these two queries would return the same result set. The reason can be explained as:
The first half of the UNION/UNION ALL returns all records in common between the two tables (no duplicates, by our assumption), and it also return those records unique to the first table t1.
The second half of the union query returns all records in common and all records unique to the second table t2. But the UNION filters out those duplicate common records without altering the result set, since we assumed there are no duplicates.
The second half of the union all query selectively removes the duplicate common records using WHERE t1.id IS NULL. This ensures that only the records unique to the second table are added by the second half of the UNION ALL.
Now, if the first table itself happened to have duplicates, this is what would happen:
In the union query, duplicate records which occurred in the first table would be filtered off. This is subtle, because duplicates can arise from two sources here. First, there could be duplicates with the first table itself. Second, there could be duplicates which arise from the join. All duplicates would be removed from a UNION.
However, in the union all query, no duplicates would be removed. The duplicate records which might happen to appear in the first table would survive intact in the final result set, as would any duplicates which resulted from the join.
This is a long winded answer, but hopefully it convinces you that in the case of duplicates, the UNION and UNION ALL versions of the accepted answer may not generate the same result set.

how sql engine handle join query with non-equal?

sql engine would use HashJoin if a query like this:
select * from table1 t1 left join table2 t2 on t1.id = t2.id;
that's fine. but if the query is like this:
select * from table1 t1 left join table2 t2 on t1.id > t2.id;
how to handle this?
the nestedloop join would work, but is there any better way?
For distributed SQL, a straight up non-qual join (t1.id > t2.id) is pretty expensive to execute. If one side is small you do a broadcast, and then use a sorted index on every node. If both sides are large, you can to range partition one and build a sorted index, and then replicate the other rows to any range that might match.
Normally, you have a combination equality and non-equal join like t1.id = t2.id and t1.cost < t2.cost. In that case, case you can do a normal distributed hash join, and then keep a sorted list of the secondary items to perform the non-equal part. This is what Presto does.

How to do a join on 2 tables, but only return the data for one table?

I am not sure if this is possible. But is it possible to do a join on 2 tables, but return the data for only one of the tables. I want to join the two tables based on a condition, but I only want the data for one of the tables. Is this possible with SQL, if so how? After reading the docs, it seems that when you do a join you get the data for both tables. Thanks for any help!
You get data from both tables because join is based on "Cartesian Product" + "Selection". But after the join, you can do a "Projection" with desired columns.
SQL has an easy syntax for this:
Select t1.* --taking data just from one table
from one_table t1
inner join other_table t2
on t1.pk = t2.fk
You can chose the table through the alias: t1.* or t2.*. The symbol * means "all fields".
Also you can include where clause, order by or other join types like outer join or cross join.
A typical SQL query has multiple clauses.
The SELECT clause mentions the columns you want in your result set.
The FROM clause, which includes JOIN operations, mentions the tables from which you want to retrieve those columns.
The WHERE clause filters the result set.
The ORDER BY clause specifies the order in which the rows in your result set are presented.
There are a few other clauses like GROUP BY and LIMIT. You can read about those.
To do what you ask, select the columns you want, then mention the tables you want. Something like this.
SELECT t1.id, t1.name, t1.address
FROM t1
JOIN t2 ON t2.t1_id = t1.id
This gives you data from t1 from rows that match t2.
Pro tip: Avoid the use of SELECT *. Instead, mention the columns you want.
This would typically be done using exists (or in) if you prefer:
select t1.*
from table1 t1
where exists (select 1 from table2 t2 on t2.x = t1.y);
Although you can use join, it runs the risk of multiplying the number of rows in the result set -- if there are duplicate matches in table2. There is no danger of such duplicates using exists (or in). I also find the logic to be more natural.
If you join on 2 tables.
You can use SELECT to select the data you want
If you want to get a table of data, you can do this,just select one table date
SELECT b.title
FROM blog b
JOIN type t ON b.type_id=t.id;
If you want to get the data from two tables, you can do this,select two table date.
SELECT b.title,t.type_name
FROM blog b
JOIN type t ON b.type_id=t.id;

Which of the two SELECT statements is faster?

It seems that the second statement applies the where condition first before joining and the first one does join before applying the where condition, so the second one would be faster because it would do less joining. But is that really the case? Is there a reference which says definitely that in the first statement the where condition is executed after all the other joining operations finish?
SELECT * FROM class t1
LEFT JOIN class_students t2 ON t1.id = t2.class_id
LEFT JOIN student t3 ON t2.student_id = t3.id
WHERE t1.id = 1;
or
SELECT * FROM (SELECT * FROM class WHERE id = 1) t1
LEFT JOIN class_students t2 ON t1.id = t2.class_id
LEFT JOIN student t3 ON t2.student_id = t3.id;
Your second option has a "derived table" (a subquery in FROM or JOIN). Subqueries usually take extra effort. So, usually it is better to avoid them.
In your particular example, the Optimizer will probably start with t1 because the WHERE clause mentions it. That is, the execution will filter based on t1.id = 1, just as you suggest the second version would do.
Note my italicized words... There are exceptions to my statements; if you find a case where the second version runs faster, present it, I may be able to explain why it runs faster. (A likely example is where the subquery has GROUP BY and/or LIMIT. This is different enough from WHERE to make a difference.)

Which Query is faster if we put the "Where" inside the Join Table or put it at the end?

Ok, I am using Mysql DB. I have 2 simple tables.
Table1
ID-Text
12-txt1
13-txt2
42-txt3
.....
Table2
ID-Type-Text
13- 1 - MuTxt1
42- 1 - MuTxt2
12- 2 - Xnnn
Now I want to join these 2 tables to get all data for Type=1 in table 2
SQL1:
Select * from
Table1 t1
Join
(select * from Table2 where Type=1) t2
on t1.ID=t2.ID
SQL2:
Select * from
Table1 t1
Join
Table2 t2
on t1.ID=t2.ID
where t2.Type=1
These 2 queries give the same result, but which one is faster?
I don't know how Mysql does the Join (or How the Join works in Mysql) & that why I wonder this!!
Exxtra info, Now if i don't want type=1 but want t2.text='MuTxt1', so Sql2 will become
Select * from
Table1 t1
Join
Table2 t2
on t1.ID=t2.ID
where t2.text='MuTxt1'
I feel like this query is slower??
Sometimes the MySQL query optimizer does a pretty decent job and sometimes it sucks. Having said that, there are exception to my answer where the optimizer optimizes something else better.
Sub-Queries are generally expensive as MySQL will need to execute and store results seperately. Normally if you could use a sub-query or a join, the join is faster. Especially when using sub-query as part of your where clause and don't put a limit to it.
Select *
from Table1 t1
Join Table2 t2 on t1.ID=t2.ID
where t2.Type=1
and
Select *
from Table1 t1
Join Table2 t2
where t1.ID =t2.ID AND t2.Type=1
should perform equally well, while
Select *
from Table1 t1
Join (select *
from Table2
where Type=1) t2
on t1.ID=t2.ID
most likely is a lot slower as MySQL stores the result of select * from Table2 where Type=1 into a temporary table.
Generally joins work by building a table comprised of all combinations of rows from both table and afterwards removing lines which do not match the conditions. MySQL of course will try to use indexes containing the columns compared in the on clause and specified in the where clause.
If you are interested in which indexes are used, write EXPLAIN in front of your query and execute.
As per my view 2nd query is more better than first query in terms of code readability and performance. You can include filter condition in Join clause also like
Select * from
Table1 t1
Join
Table2 t2 on t1.ID=t2.ID and t2.Type=1
You can compare execution time for all queries in SQL fiddle here :
Query 1
Query 2
My Query
I think this question is hard to answer since we don't exactly know the internals of the query parser in the database. Usually these kind of constructions are evaluated by the database in a similar way (it can see that the first and second query are identical so parses it correctly, or not).
I would write the second one since it is more clear what is happening.