What is difference between these two methods of selecting data from multiple tables. First one does not use JOIN while the second does. Which one is prefered method?
Method 1:
SELECT t1.a, t1.b, t2.c, t2.d, t3.e, t3.f
FROM table1 t1, table2 t2, table3 t3
WHERE t1.id = t2.id
AND t2.id = t3.id
AND t3.id = x
Method 2:
SELECT t1.a, t1.b, t2.c, t2.d, t3.e, t3.f
FROM `table1` t1
JOIN `table2` t2 ON t1.id = t2.id
JOIN `table3` t3 ON t1.id = t3.id
WHERE t1.id = x
For your simple case, they're equivalent. Even though the 'JOIN' keyword is not present in Method #1, it's still doing joins.
However, method #2 offers the flexibility of allowing extra conditions in the JOIN condition that can't be accomplished via WHERE clauses. Such as when you're doing aliased multi-joins against the same table.
select a.id, b.id, c.id
from sometable A
left join othertable as b on a.id=b.a_id and some_condition_in_othertable
left join othertable as c on a.id=c.a_id and other_condition_in_othertable
Putting the two extra conditions in the whereclause would cause the query to return nothing, as both conditions cannot be true at the same time in the where clause, but are possible in the join.
The methods are apparently identical in performance, it's just new vs old syntax.
I don't think there is much of a difference. You could use the EXPLAIN statement to check if MySQL does anything differently. For this trivial example I doubt it matters.
Related
I'm trying to create one snowflake view using multiple tables.
I understand that FROM...JOIN statements can combine multiple tables.
When I would like to join from one table that already has a join from another table, what is the best way to write a script?
In this case, from Table 3, Table 4 and Table 5 are joined. The Table 3 is joined from Table 1.
Your question is not clear, but normally all the tables are joined as follows
select *
from table1 t1
join table2 t2 on t1.id = t2.table1_ID
join table3 t3 on t1.id = t3.table1_ID
join table4 t4 on t3.id = t4.table3_ID
join table5 t5 on t3.id = t5.table3_ID
Your data is not clear what kind of data you need, but it depends on your needs, what information you need with what combination of tables.
with cte1 as (
select *
from table1 t1
join table2 t2 on t1.id = t2.table1_ID
join table3 t3 on t1.id = t3.table1_ID
),
cte2 as (
select *
from table3 t3
join table4 t4 on t3.id = t4.table3_ID
join table5 t5 on t3.id = t5.table3_ID)
select t123.column1, t345.column2
from cte1 join cte2 on cte1.id = cte2.id
You should be able to join exactly in the relation hierarchy you have listed such as
select
t1.*,
t2.whatever,
t3.whatever3,
t4.whatever4,
t5.whatever5
from
table1 t1
join table2 t2
on t1.t2id = t2.id
join table3 t3
on t1.t3id = t3.id
join table4 t4
on t3.t4id = t4.id
join table5 t5
on t3.t5id = t5.id
So, what is the confusion
Meysam answer is very valid, but I see there is more questions at hand.
[Edit] This answer is mostly general, but also focused on the Snowflake-cloud-data-platform tag perspective.
Normally you can have a single block of SELECT and there all the TABLES in the FROM JOIN zone, and all the WHERE's you like, in modern form the WHERE's that belong to the JOINS and not filters, are put on the ON, thus Meysam's answer.
SELECT
t1.thing,
t2.other_thing,
t4.extra_detail,
t5.one_last_thing
FROM table1 t1
JOIN table2 t2
ON t1.id = t2.table1_ID
JOIN table3 t3
ON t1.id = t3.table1_ID
JOIN table4 t4
ON t3.id = t4.table3_ID
JOIN table5 t5
ON t3.id = t5.table3_ID
Now you mention a CTE which could be done on the table3 chain if there was merit like so:
WITH table_3_sub_chain_cte_of_merit AS (
SELECT
t3.table1_ID
t4.extra_detail,
t5.one_last_thing
FROM table3 t3
JOIN table4 t4
ON t3.id = t4.table3_ID
JOIN table5 t5
ON t3.id = t5.table3_ID
)
SELECT
t1.thing,
t2.other_thing,
cte3.extra_detail,
cte3.one_last_thing
FROM table1 t1
JOIN table2 t2
ON t1.id = t2.table1_ID
JOIN table_3_sub_chain_cte_of_merit cte3
ON t1.id = cte3.table1_ID
OR the CTE sub expression can be moved into a sub-select, if that had merit, like so:
SELECT
t1.thing,
t2.other_thing,
cte3.extra_detail,
cte3.one_last_thing
FROM table1 t1
JOIN table2 t2
ON t1.id = t2.table1_ID
JOIN (
SELECT
t3.table1_ID
t4.extra_detail,
t5.one_last_thing
FROM table3 t3
JOIN table4 t4
ON t3.id = t4.table3_ID
JOIN table5 t5
ON t3.id = t5.table3_ID
)cte3
ON t1.id = cte3.table1_ID
Now the interesting part, merit, why would we be doing these things.
The first version should meet your needs just fine if you want to just get some value off each table and move on. But if you are doing some complex filters on table3 and below, or you are doing some expensive aggregation on the table3 and below, but those result are match many times to table1, then doing the work in a CTE or sub-query makes sense.
Now why might you use the CTE over the subquery, the simple answer for the code given they are the same. But if you joined table1 and table3 multiple times, because you are calculated daily costs, weekly costs, and monthly costs, then build the costs once (in a CTE), and then joining those results can save a lot of time. But at the same time, sometimes CTE's can slow things down, as what might seem the "expensive code" is mostly free once the other work is taken into account, and thus I have seen code run faster on Snowflake doing a large aggregation three times in sub-selects as it removes the synchronization cost between the data paths, and the remote data read was the same bottle neck under both.
On the other hand sometime CTE's make reading the code cleaner, as you get to name the expression something meaningful, and then use an alias, so the SQL is more readable, but the intent is captured. And Snowflake optimizer rewrites the SQL anyways, some they can and are often the same. So helping humans is more value.
On other databases there optimizers can be helped by the order of the joins, and them being nested (or so I have been told) but I have not read/witnessed that on snowflake, but have spent days rewriting SQL to have it have the "same execution plan" in the other form.
But where CTE's shine can be in really large (hundreds of lines of SQL) in pushing filters to where you want them, to avoid really large data reads, and full table processing only to have that pruned. This sort of thing is spot able in the query profiler, but 10 billion rows going between blocks for many steps only to hit a filter later in the pipeline and 5 thousand coming out.
Hello guys I have a specific question about the AND clause in SQL.
The two following SQL statements provide the same output:
SELECT * FROM Table1 t1 INNER JOIN Table2 t2 ON t1.id = t2.id AND t2.id = 0
SELECT * FROM Table1 t1 INNER JOIN Table2 t2 ON t1.id = t2.id WHERE t2.id = 0
Notice the difference at the end of the query. In the first one, I use the AND clause (without using the WHERE clause before). In the second one, I use a WHERE to specify my id.
Is the first syntax correct?
If yes, is the first one better in terms of performance (not using WHERE clause for filtering after)?
Should I expect different outputs with different queries?
Thanks for your help.
Yes, no, and no.
To be specific:
Yes, the syntax is correct. Conceptually, the first query creates an inner join between t1 and t2 with the join condition t1.id = t2.id AND t2.id = 0, while the second creates an inner join on t1.id = t2.id and then filters the result using the condition t2.id = 0.
However, no SQL engine I know of would actually execute either query like that. Rather, in both cases, the engine will optimize both of them to something like t1.id = 0 AND t2.id = 0 and then do two single-row lookups.
No, pretty much any reasonable SQL engine should treat these two queries as effectively identical.
No, see above.
By the way, the following ways to write the same query are also valid:
SELECT * FROM Table1 t1 INNER JOIN Table2 t2 WHERE t1.id = t2.id AND t2.id = 0
SELECT * FROM Table1 t1, Table2 t2 WHERE t1.id = t2.id AND t2.id = 0
I have a query with 3 joins:
SELECT t1.email, t2.firstname, t2.lastname, t4.value
FROM t1
left join t2 on t1.email = t2.email
Inner join t3 on t2.entity_id = t3.order_id
Inner join t4 on t3.product_id = t4.entity_id
WHERE t4.attribute_id = 126
I think my server just can't make it :) --> time is running out so an error occurs!
Thanks a lot
Table structur:
T1:
email (which is the same then in t2)
T2:
email firstname lastname orderid (which is called entity id in t3)
T3:
entityid product id (which is called entity id in t4)
T4:
entityid attributeid value
Unless t2 links straight to t4 there is no way.
Also, do you need a left join between t1 and t2?
As #Sachin already stated, you can't "shorten" this query unless t2 links straight to t4 without requiring a comparison with t3. However, in order to speed up your query, you should have indexes on some or all of the columns referenced in your join conditions (i.e. t1.email, t2.email, t2.entity_id, etc).
Having an index on each of these columns will give you much faster SELECT queries, but it will slow down your INSERT and UPDATE queries. So if you SELECT more often than you INSERT or UPDATE, then you should definitely be using indexes. If not, try to make indexes in wise places (tables that have INSERT or UPDATE statements run less often but still have a lot of rows, for instance).
For further clarification, see the following links:
More information on how indexes work
Syntax for creating indexes
Try your query this way:
SELECT t1.email, t2.firstname, t2.lastname, t4.value
FROM t4
INNER JOIN t3 ON t3.product_id = t4.entity_id
INNER JOIN t2 ON t2.entity_id = t3.order_id
INNER JOIN t1 ON t1.email = t2.email
WHERE t4.attribute_id = 126
It's basically your query but "backwards". Your original way, your DBMS has to try to join t2 for ALL records in t1, then join t3 for ALL records found in t2 before it can even attempt to address your WHERE clause.
My way, you're finding all the records in t4 where attribute_id = 126 first, THEN attempting to join other tables. It should be a lot quicker. You should then be able to speed things up even more by making sure the proper indexes exist on the tables involved. You can prepend the keyword EXPLAIN to your query to see how the DBMS attempts to seek data in your query.
First a quick explanation: I am actually dealing with four tables and mining data from different places but my problem comes down to this seemingly simple concept and yes I am very new to this...
I have two tables (one and two) that both have ID columns in them. I want to query only the ID columns that are in table two only, not in both. As in..
Select ID
From dbo.one, dbo.two
Where dbo.two != dbo.one
I actually thought this would work but I'm getting odd results. Can anyone help?
SELECT t2.ID
FROM dbo.two t2
WHERE NOT EXISTS(SELECT NULL
FROM dbo.one t1
WHERE t2.ID = t1.ID)
This could also be done with a LEFT JOIN:
SELECT t2.ID
FROM dbo.two t2
LEFT JOIN dbo.one t1
ON t2.ID = t1.ID
WHERE t1.ID IS NULL
Completing the other 2 options after Joe's answer...
SELECT id
FROM dbo.two
EXCEPT
SELECT id
FROM dbo.one
SELECT t2.ID
FROM dbo.two t2
WHERE t2.ID NOT IN (SELECT t1.ID FROM dbo.one t1)
Note: LEFT JOIN will be slower than the other three, which should all give the same plan.
That's because LEFT JOIN is a join followed by a filter, the other 3 are semi-join
If I want to perform joins on 3 or more tables, what is the best syntax?
This is my attempt:
Select *
from table1
inner join table2 using id1, table2
inner join table3 using id2, table3
inner join table4 using id4
where table2.column1="something"
and table3.column4="something_else";
does that look right? The things I'm not sure about are
1) do I need to seperate the joins with a comma
2) am I right to make all my joins first and then put my conditions after that?
3) would I be better to use sub-queries and if so what is the corect syntax
Thanks for any advice!
Try to avoid using * where possible.
Specify exactly the data you want returned.
Format your queries using a standard style.
Pick a style you like and keep to it.
You will thank yourself later when your queries get more complex.
Most optimizers will recognize when a condition in a WHERE clause implies an INNER JOIN, but there's no reason not to code that explicitly; if nothing else it keeps your WHERE clause manageable.
Be explicit about what columns you join on. Be explicit about the type of join you're using. USING seems like a shortcut that could get you into trouble.
MySQL has traditionally not handled subqueries as well as could be hoped. That may be changing in newer versions, but there are other ways to get your data without relying on them.
Welcome to the wonderful world of relational databases!
select t1.*
, t2.*
, t3.*
, t4.*
from table1 t1
inner join table2 t2
on t1.id = t2.t1_id
and
t2.column1 = "something"
inner join table3 t3
on t2.id = t3.t2_id
and
t3.column4 = "something_else"
inner join table4 t4
on t3.id = t4.t3_id;
1) do I need to seperate the joins with a comma
No
2) am I right to make all my joins first and then put my conditions after that?
Yes
3) would I be better to use sub-queries and if so what is the corect syntax
No. Joining tables is the preferred and correct way.
Joins are not separated by a comma
ANSI syntax puts the joins first then where condition
e.g. SELECT * FROM table1 INNER JOIN table2 ON table1.id = table2.id WHERE table2.column1='Something'
I'm not 100% sure what you are trying to achieve. But it looks like you do not need to use subqueries.
A subquery would be executed for every row, it sounds as though you could run a more efficient query just using inner joins.
Hope that helps. If you can elaborate a little I will provide more explanation.
Given your requirement that table2 gets joind on the id1-columns in table1 and table2, table3 gets joind on the id2-columns in table2 and table3 and table4 gets joind on the id3-columns in table3 and table4 you'll have to do:
SELECT *
FROM table1
INNER JOIN table2 ON table2.id1 = table1.id1
INNER JOIN table3 ON table3.id2 = table2.id2
INNER JOIN table4 ON table4.id3 = table3.id3
WHERE table2.column1 = "something"
AND table3.column4 = "something_else"
I think this statement is much more clearer on what is exactly joined in which way - compared to the USING-statement.
Remove the comma's and the duplicate table names, like:
Select *
from table1
inner join table2 using id1
inner join table3 using id2
inner join table4 using id4
where table2.column1="something"
and table3.column4="something_else"
If id4 has a different name in table1, explicitly name the join condition, for example:
inner join table4 on table4.id = table1.table4i
You may be able to use natural join which joins on field names common to the tables you want to join as follows.
SELECT *
FROM table1
NATURAL JOIN table2
NATURAL JOIN table3
NATURAL table4
WHERE table2.column1 = "something"
AND table3.column4 = "something_else"