Just keeping it simple here, so this is not specific to any computer language. But suppose I'm joining 2 tables, A and B. The relationship between the 2 tables is 1:many. If I use some type of code that performs an inner join between A and B, the result, let's call this table C, will yield a number of rows equal to the Cartesian product of A and B, given that we've filtered out what doesn't match between A and B on the criteria of the inner join.
In my work environment, people use the term "exploded" to describe the output shown in C, since the output table is typically much larger than A or B. I don't like this term "exploded." I was wondering if anyone had another term that more appropriately implies that table C is essentially table A enlarged with the addition of data from table B.
Related
Related to Join vs. sub-query but a different type of situation, and just trying to understand how this works.
I had to make a view where I get a bunch of employee codes from one table, and I have to get their names from a different table - the same two tables every time. I arranged my query like this:
SELECT
(SELECT name from emptable where empcod = code1) as emp1, code1,
(SELECT name from emptable where empcod = code2) as emp2, code2,
[repeat 6 times]
FROM codetable
It is more complicated than this, and more tables are joined, but this is the element I want to ask about. My boss says joining like so is better:
SELECT e1.name, c.code1, e2.name, c.code2, e3.name, code3 [etc]
FROM codetable c
INNER JOIN emptable e1 ON e1.empcod = c.code1
INNER JOIN emptable e2 ON e2.empcod = c.code2
INNER JOIN emptable e3 ON e3.empcod = c.code3
My reasoning, aside from not having to go search in the joins which table gets whose name and why, is the way I understand the join goes like this:
Take whole table A
Take whole table B
Combine all the data from both tables according to the 'ON' section of the join
select one single string from this complete combination of two whole tables from which I need no other data
I think it's obvious that this seems like it would take up a lot of resources. I understand the subquery as
Get one datum from table A (the employee code)
Match this one datum to every record from table B until you find a match
As soon as you get a match, bring back this one single datum from this other table (the employee's name)
Understanding that in the table of employees, the employee code is a primary key and cannot be duplicated, so every subquery can only ever give me one single string back.
It seems to me that comparing ONE number from one table to ONE number from another table and retrieving ONE string related to that number would be less resource-intensive than matching ALL of the data in two whole tables together in order to get this one string. But I figure I don't know what these databases are doing behind the scenes, and a lot of people seem to prefer joins. Can anyone explain this to me, if I'm understanding it wrong? The other posts I find here of similar situations tend to want more information from more tables, I'm not immediately finding anything about matching the same two tables six or seven times to retrieve one single string for every configuration.
Thanks in advance.
So as ScaisEdge explained, a join only gets executed once - and thus only spends time and resources once - no matter how many rows you have, whereas each of the six subselects get executed once for every row. If you have 100 rows, you're executing six joins once or you're executing 6 subselects 100 times.
It makes sense that this would be more resource-intensive, and I did not explain clearly enough that my case involves only one row at a time - in which case I guess the difference would be negligible anyway.
A natural join is an inner join that only works if table1 has some intersecting attributes with table2.
Yet, when I take tables that have no column names in common, it acts as a Cartesian product.
In addition, when I take different tables that have nothing in common, it displays no results.
Why?
Well, you have learned the first important lesson, which is to avoid natural join. It is just lousy syntax, because it does not even take properly declared foreign key relationships into account and the join conditions are hidden -- which makes queries hard to maintain and debug.
A natural join is an inner join equijoin with the join conditions on columns with the same names. Natural joins do not even take types into account, so the query can have type conversion errors if your data is really messed.
If the corresponding inner join on the common column names have no matches, then it returns the empty set. If there are no common column names, then it is the same as a cross join.
The way to think about it is that a natural join (inner natural join) generates the Cartesian product of two tables. When the tables have duplicated column names, then the final result set contains only those Cartesian-product rows where the common column names have the same value.
I came across this question in an interview and I have been wondering if what I did was right. Let's say I have a table 'A' with the following attributes:
R S T
-----------
a1 b1 c1
a1 b2 c2
a1 b3 c3
a4 b4 c4
and lets say I need to calculate the Relational Algebra for given B = {[(projection)R,S (A) NATURAL JOIN (projection) S,T (A) ] NATURAL JOIN (projection)R,T (A)}
what would be the result?
This is what I tried:
-We know (A) NATURAL JOIN (A) = A
-I did the first set of join within the square bracket. Since we had attribute 'S' in common I just yielded the result to be a table of (R S T) with the same 4 rows of tuples.
-Finally, I joined (R S T) with the second set of join where attributes 'R' and 'T' are common which I assumed will yield R S T again with 4 rows of tuples.
Meaning, with the way I did it, I ended up getting B = A.
I did not consider the tuples at all, I just did a natural join based on the common attributes between two projections.
I know that's very stupid.. but I am trying to execute it in MySQL and for some reason I am getting errors when I try to execute such a query:
select A,B from dbt2.relationalalgebra as r1 NATURAL JOIN (select B, C from dbt2.relationalalgebra as r2); and I am getting an error saying every derived table must have its own aliases!
Please help me clarify on how Natural join works on same table.
Thanks in advance for any help.
What you did is correct. And it's correct that you obtained B = A -- given that content for A.
This is a question about the functional dependencies between the values in the data. (So you might not get B = A, if the data was different.)
For attributes S and T, there's a different value in each tuple. IoW given a value for S (or T), you know which row it's from, so you know the value for the other two attributes in that tuple. The functional dependencies are S -> R, T; T -> R, S. (You might say that S or T are each keys for A.)
The pairs of attributes in the projections you give each include at least one key, so uniquely determine which 'missing' attribute gets joined. You're seeing a lossless join decomposition, as per Heath's Theorem. http://en.wikipedia.org/wiki/Functional_dependency
A natural join is a shorthand for joining two tables (or subqueries) on all columns that have the same name.
A natural join of a table to itself could have several consequences. The most common would be the table itself -- if none of the values are NULL and the rows are unique. If each row had a NULL value in some column, then the natural join would return no rows at all. If rows are duplicated, then multiple rows might appear.
I do not recommend ever using natural join. A small change to an underlying table structure could break a query.
Here's the deal:
Table A has columns A1 and A2
Table B has columns B1,B2,B3
Now i want to select data from columns A1 and B1 (without join), and the condition is:-
B3='someword' and A2=B2
If there is no need of printing B1, I would have written the query(without join) as:
select A1 from A where A2 in (select B2 from B where B3='someword');
But i need to print both A1 and B1, So is it possible to do that without using join and using 'IN'???
When you say you need to restrict the ouput to where A2=B2
YOU ARE SPECIFYING a JOIN.
calling it something else does not change what it was... To paraphrase Willie,
"A Join by any other name is still a Join"
Seriously, a "Join" is not the name, nor the word nor even the syntax in a query used to apply it, it is the logical predicate or restriction, or filter, that is based on values from two different tables. If you need to restrict the output to where Table1.A2 equals Table2,B2, then you have a Join
You cannot do this without JOIN effectively or in other words you can use inserted select maybe but this operation is much more slower like JOIN, JOIN is the best choice and programmatically and database written. Your potencial future client would not so happy when you would sell him slow IS.
I have the following problem to solve:
Let's say there is a table A that contains an number of elements, this table is copied to become table B. In table B some of the original elements are lost and some are added. A reference table AB keeps track of these changes. Then table B is copied to be table C and again some of the existing elements get lost and some are added. Another reference Table BC keeps track of these relations. ... etc.
There is an n number of such tables with an n-1 number of reference tables.
If I want to know which of the elements of my choice in table C where already present in A, I can do that by doing something like:
SELECT AB.oldID
FROM AB
JOIN BC
WHERE BC.newID IN (x, y, z)
Now since the number of reference tables can wary, the number of JOIN lines can wary.
Should I concatenate the query by looping over the steps and adding JOIN lines or shoudl I rather write a recursive function that selects only the members of the next step and then let the function call itself until I have the end result?
Or is there an other even better way to do something like that?
Since your table names vary, you'll need to build some kind of a dynamical query.
If you do the recursive function approach, you'll need to pass the resultsets between the function calls somehow.
MySQL has no array datatype, and storing the results in a temp table is way too long.
Conclusion: use joins.
Update:
Here's a sample query which returns the entries that persisted through revision A to revision M (with one table design):
SELECT *
FROM entries e
WHERE NOT EXISTS
(
SELECT *
FROM revisions r
JOIN revision_changes rc
ON rc.revision_id = r.id
WHERE rc.entry_id = e.id
AND rc.deleted
AND r.revision_id BETWEEN 'A' AND 'M'
)
This way, you just fill the added and deleted fields of revision_changes for the revisions where the entry was added or deleted.