How to join from one joined table to another table in snowflake - mysql

I'm trying to create one snowflake view using multiple tables.
I understand that FROM...JOIN statements can combine multiple tables.
When I would like to join from one table that already has a join from another table, what is the best way to write a script?
In this case, from Table 3, Table 4 and Table 5 are joined. The Table 3 is joined from Table 1.

Your question is not clear, but normally all the tables are joined as follows
select *
from table1 t1
join table2 t2 on t1.id = t2.table1_ID
join table3 t3 on t1.id = t3.table1_ID
join table4 t4 on t3.id = t4.table3_ID
join table5 t5 on t3.id = t5.table3_ID
Your data is not clear what kind of data you need, but it depends on your needs, what information you need with what combination of tables.
with cte1 as (
select *
from table1 t1
join table2 t2 on t1.id = t2.table1_ID
join table3 t3 on t1.id = t3.table1_ID
),
cte2 as (
select *
from table3 t3
join table4 t4 on t3.id = t4.table3_ID
join table5 t5 on t3.id = t5.table3_ID)
select t123.column1, t345.column2
from cte1 join cte2 on cte1.id = cte2.id

You should be able to join exactly in the relation hierarchy you have listed such as
select
t1.*,
t2.whatever,
t3.whatever3,
t4.whatever4,
t5.whatever5
from
table1 t1
join table2 t2
on t1.t2id = t2.id
join table3 t3
on t1.t3id = t3.id
join table4 t4
on t3.t4id = t4.id
join table5 t5
on t3.t5id = t5.id
So, what is the confusion

Meysam answer is very valid, but I see there is more questions at hand.
[Edit] This answer is mostly general, but also focused on the Snowflake-cloud-data-platform tag perspective.
Normally you can have a single block of SELECT and there all the TABLES in the FROM JOIN zone, and all the WHERE's you like, in modern form the WHERE's that belong to the JOINS and not filters, are put on the ON, thus Meysam's answer.
SELECT
t1.thing,
t2.other_thing,
t4.extra_detail,
t5.one_last_thing
FROM table1 t1
JOIN table2 t2
ON t1.id = t2.table1_ID
JOIN table3 t3
ON t1.id = t3.table1_ID
JOIN table4 t4
ON t3.id = t4.table3_ID
JOIN table5 t5
ON t3.id = t5.table3_ID
Now you mention a CTE which could be done on the table3 chain if there was merit like so:
WITH table_3_sub_chain_cte_of_merit AS (
SELECT
t3.table1_ID
t4.extra_detail,
t5.one_last_thing
FROM table3 t3
JOIN table4 t4
ON t3.id = t4.table3_ID
JOIN table5 t5
ON t3.id = t5.table3_ID
)
SELECT
t1.thing,
t2.other_thing,
cte3.extra_detail,
cte3.one_last_thing
FROM table1 t1
JOIN table2 t2
ON t1.id = t2.table1_ID
JOIN table_3_sub_chain_cte_of_merit cte3
ON t1.id = cte3.table1_ID
OR the CTE sub expression can be moved into a sub-select, if that had merit, like so:
SELECT
t1.thing,
t2.other_thing,
cte3.extra_detail,
cte3.one_last_thing
FROM table1 t1
JOIN table2 t2
ON t1.id = t2.table1_ID
JOIN (
SELECT
t3.table1_ID
t4.extra_detail,
t5.one_last_thing
FROM table3 t3
JOIN table4 t4
ON t3.id = t4.table3_ID
JOIN table5 t5
ON t3.id = t5.table3_ID
)cte3
ON t1.id = cte3.table1_ID
Now the interesting part, merit, why would we be doing these things.
The first version should meet your needs just fine if you want to just get some value off each table and move on. But if you are doing some complex filters on table3 and below, or you are doing some expensive aggregation on the table3 and below, but those result are match many times to table1, then doing the work in a CTE or sub-query makes sense.
Now why might you use the CTE over the subquery, the simple answer for the code given they are the same. But if you joined table1 and table3 multiple times, because you are calculated daily costs, weekly costs, and monthly costs, then build the costs once (in a CTE), and then joining those results can save a lot of time. But at the same time, sometimes CTE's can slow things down, as what might seem the "expensive code" is mostly free once the other work is taken into account, and thus I have seen code run faster on Snowflake doing a large aggregation three times in sub-selects as it removes the synchronization cost between the data paths, and the remote data read was the same bottle neck under both.
On the other hand sometime CTE's make reading the code cleaner, as you get to name the expression something meaningful, and then use an alias, so the SQL is more readable, but the intent is captured. And Snowflake optimizer rewrites the SQL anyways, some they can and are often the same. So helping humans is more value.
On other databases there optimizers can be helped by the order of the joins, and them being nested (or so I have been told) but I have not read/witnessed that on snowflake, but have spent days rewriting SQL to have it have the "same execution plan" in the other form.
But where CTE's shine can be in really large (hundreds of lines of SQL) in pushing filters to where you want them, to avoid really large data reads, and full table processing only to have that pruned. This sort of thing is spot able in the query profiler, but 10 billion rows going between blocks for many steps only to hit a filter later in the pipeline and 5 thousand coming out.

Related

simplifying mysql query with joins

I am working on some other developers code and need to simplify this query which is in the format below.
SELECT
table1.id, table2.userid, table4.groupid
FROM
table1, table2, table3, table4
WHERE
userid = table4_userid
and
table2.id = table3.table2_id
and
table1.table3_id = table3.id
and
table3.statement_id = 264803
order by table4_groupid
But I am used to join queries by explicitly mentioning the join type i.e. LEFT, RIGHT OR OUTER join. I also use TableName.TableField so that I know which field is from which table. However, as you can see above it's a bit of mix of tablename.tablefield and just tablefield. The above query is working fine but I need to make table4 as a LEFT JOIN so that if there aren't any matching rows in table4 it should still show some data.
My questions are:
1) What types of joins are above?
2) How do I change the above query to make table4 as a LEFT JOIN?
I know you may want the original query but I need just little pointers towards right direction and I will do the rest myself.
Since you are using the tables directly in the where statement, it will be considered ordinary INNER JOINs. Using the where statement for joining table are the old way of joining and much harder to read. If you would like to LEFT JOIN table4, I suggest that you to rewrite the query like this:
SELECT
table1.id,
table2.userid,
table4.groupid
FROM
table1
JOIN table3
ON table1.table3_id = table3.id
JOIN table2
ON table2.id = table3.table2_id
LEFT JOIN table4
ON table2.userid = table4_userid
WHERE
table3.statement_id = 264803
order by
table4_groupid
Looking at the conditions after the WHERE all the above are inner joins.
Try this one, I am not sure of the ONs as we don't really know the tables' structures:
SELECT
t1.id, t2.userid, COALESCE(t4.groupid, 0)
FROM
table1 t1 INNER JOIN table3 t3 ON t1.table3_id=t3.id
INNER JOIN table2 t2 ON t2.id=t3.table2_id
LEFT JOIN table4 t4 ON t1.userid=t4.table4_userid
WHERE
t3.statement_id = 264803
order by t4.groupid

Why is my SELECT with a subquery and JOINs so slow?

This query takes 10 seconds to complete. But when I manually perform the subquery and change the t1.id restriction to that list, it's done in 0.00 seconds. What can I do to let MySQL execute the query quicker?
SELECT t1.col1, t2.col2, t3.col3
FROM t1, t2, t3
WHERE t1.t2id = t2.id AND t1.t3id = t3.id
AND t1.id IN ( SELECT id FROM t4 WHERE blah = 123 )
Also, why is this happening? I suppose MySQL joins all three tables in some way before filtering on t1.id.
t1, t2 and t3 contain 3000, 15 and 80 rows, respectively. The subquery returns 2-10 rows.
try to use the "INNER JOIN" rather than "IN" function.
This way your sql instruction will be more performative.
SELECT t1.col1, t2.col2, t3.col3
FROM ((t1 INNER JOIN t2 ON t1.t2id = t2.id) INNER JOIN t3 ON t1.t3id = t3.id) INNER JOIN t4 ON t1.id = t4.id
WHERE t4.blah = 123
Rewrite the query without subquery:
SELECT t1.col1, t2.col2, t3.col3
FROM t1, t2, t3, (SELECT id FROM t4 WHERE blah = 123) AS t4
WHERE t1.t2id = t2.id AND t1.t3id = t3.id
AND t1.id=t4.id
Be sure you have indexes on the fields used in the WHERE clauses.
If you run EXPLAIN over your statement, you may see that MySql has created a temporary table on disk: this often happens if the data within an inline select (your IN term) is sufficiently large.
In a nutshell:
a) use EXPLAIN to see what's going on in the database (and to see how this behaviour changes with increased data)
b) avoid inline subqueries if you can
c) remember that MySql only has a nested loop join algorithm at its disposal (other DBs use Hash join- and merge-join algorithms too) so you may see "less" parsing going on for smaller data sets, but a sudden drop off when you reach a tipping point.

Shorten a join query

I have a query with 3 joins:
SELECT t1.email, t2.firstname, t2.lastname, t4.value
FROM t1
left join t2 on t1.email = t2.email
Inner join t3 on t2.entity_id = t3.order_id
Inner join t4 on t3.product_id = t4.entity_id
WHERE t4.attribute_id = 126
I think my server just can't make it :) --> time is running out so an error occurs!
Thanks a lot
Table structur:
T1:
email (which is the same then in t2)
T2:
email firstname lastname orderid (which is called entity id in t3)
T3:
entityid product id (which is called entity id in t4)
T4:
entityid attributeid value
Unless t2 links straight to t4 there is no way.
Also, do you need a left join between t1 and t2?
As #Sachin already stated, you can't "shorten" this query unless t2 links straight to t4 without requiring a comparison with t3. However, in order to speed up your query, you should have indexes on some or all of the columns referenced in your join conditions (i.e. t1.email, t2.email, t2.entity_id, etc).
Having an index on each of these columns will give you much faster SELECT queries, but it will slow down your INSERT and UPDATE queries. So if you SELECT more often than you INSERT or UPDATE, then you should definitely be using indexes. If not, try to make indexes in wise places (tables that have INSERT or UPDATE statements run less often but still have a lot of rows, for instance).
For further clarification, see the following links:
More information on how indexes work
Syntax for creating indexes
Try your query this way:
SELECT t1.email, t2.firstname, t2.lastname, t4.value
FROM t4
INNER JOIN t3 ON t3.product_id = t4.entity_id
INNER JOIN t2 ON t2.entity_id = t3.order_id
INNER JOIN t1 ON t1.email = t2.email
WHERE t4.attribute_id = 126
It's basically your query but "backwards". Your original way, your DBMS has to try to join t2 for ALL records in t1, then join t3 for ALL records found in t2 before it can even attempt to address your WHERE clause.
My way, you're finding all the records in t4 where attribute_id = 126 first, THEN attempting to join other tables. It should be a lot quicker. You should then be able to speed things up even more by making sure the proper indexes exist on the tables involved. You can prepend the keyword EXPLAIN to your query to see how the DBMS attempts to seek data in your query.

SQL Server 2008 column in only one table

First a quick explanation: I am actually dealing with four tables and mining data from different places but my problem comes down to this seemingly simple concept and yes I am very new to this...
I have two tables (one and two) that both have ID columns in them. I want to query only the ID columns that are in table two only, not in both. As in..
Select ID
From dbo.one, dbo.two
Where dbo.two != dbo.one
I actually thought this would work but I'm getting odd results. Can anyone help?
SELECT t2.ID
FROM dbo.two t2
WHERE NOT EXISTS(SELECT NULL
FROM dbo.one t1
WHERE t2.ID = t1.ID)
This could also be done with a LEFT JOIN:
SELECT t2.ID
FROM dbo.two t2
LEFT JOIN dbo.one t1
ON t2.ID = t1.ID
WHERE t1.ID IS NULL
Completing the other 2 options after Joe's answer...
SELECT id
FROM dbo.two
EXCEPT
SELECT id
FROM dbo.one
SELECT t2.ID
FROM dbo.two t2
WHERE t2.ID NOT IN (SELECT t1.ID FROM dbo.one t1)
Note: LEFT JOIN will be slower than the other three, which should all give the same plan.
That's because LEFT JOIN is a join followed by a filter, the other 3 are semi-join

which method is better to join mysql tables?

What is difference between these two methods of selecting data from multiple tables. First one does not use JOIN while the second does. Which one is prefered method?
Method 1:
SELECT t1.a, t1.b, t2.c, t2.d, t3.e, t3.f
FROM table1 t1, table2 t2, table3 t3
WHERE t1.id = t2.id
AND t2.id = t3.id
AND t3.id = x
Method 2:
SELECT t1.a, t1.b, t2.c, t2.d, t3.e, t3.f
FROM `table1` t1
JOIN `table2` t2 ON t1.id = t2.id
JOIN `table3` t3 ON t1.id = t3.id
WHERE t1.id = x
For your simple case, they're equivalent. Even though the 'JOIN' keyword is not present in Method #1, it's still doing joins.
However, method #2 offers the flexibility of allowing extra conditions in the JOIN condition that can't be accomplished via WHERE clauses. Such as when you're doing aliased multi-joins against the same table.
select a.id, b.id, c.id
from sometable A
left join othertable as b on a.id=b.a_id and some_condition_in_othertable
left join othertable as c on a.id=c.a_id and other_condition_in_othertable
Putting the two extra conditions in the whereclause would cause the query to return nothing, as both conditions cannot be true at the same time in the where clause, but are possible in the join.
The methods are apparently identical in performance, it's just new vs old syntax.
I don't think there is much of a difference. You could use the EXPLAIN statement to check if MySQL does anything differently. For this trivial example I doubt it matters.