Natural join of table (A) with itself - mysql

I came across this question in an interview and I have been wondering if what I did was right. Let's say I have a table 'A' with the following attributes:
R S T
-----------
a1 b1 c1
a1 b2 c2
a1 b3 c3
a4 b4 c4
and lets say I need to calculate the Relational Algebra for given B = {[(projection)R,S (A) NATURAL JOIN (projection) S,T (A) ] NATURAL JOIN (projection)R,T (A)}
what would be the result?
This is what I tried:
-We know (A) NATURAL JOIN (A) = A
-I did the first set of join within the square bracket. Since we had attribute 'S' in common I just yielded the result to be a table of (R S T) with the same 4 rows of tuples.
-Finally, I joined (R S T) with the second set of join where attributes 'R' and 'T' are common which I assumed will yield R S T again with 4 rows of tuples.
Meaning, with the way I did it, I ended up getting B = A.
I did not consider the tuples at all, I just did a natural join based on the common attributes between two projections.
I know that's very stupid.. but I am trying to execute it in MySQL and for some reason I am getting errors when I try to execute such a query:
select A,B from dbt2.relationalalgebra as r1 NATURAL JOIN (select B, C from dbt2.relationalalgebra as r2); and I am getting an error saying every derived table must have its own aliases!
Please help me clarify on how Natural join works on same table.
Thanks in advance for any help.

What you did is correct. And it's correct that you obtained B = A -- given that content for A.
This is a question about the functional dependencies between the values in the data. (So you might not get B = A, if the data was different.)
For attributes S and T, there's a different value in each tuple. IoW given a value for S (or T), you know which row it's from, so you know the value for the other two attributes in that tuple. The functional dependencies are S -> R, T; T -> R, S. (You might say that S or T are each keys for A.)
The pairs of attributes in the projections you give each include at least one key, so uniquely determine which 'missing' attribute gets joined. You're seeing a lossless join decomposition, as per Heath's Theorem. http://en.wikipedia.org/wiki/Functional_dependency

A natural join is a shorthand for joining two tables (or subqueries) on all columns that have the same name.
A natural join of a table to itself could have several consequences. The most common would be the table itself -- if none of the values are NULL and the rows are unique. If each row had a NULL value in some column, then the natural join would return no rows at all. If rows are duplicated, then multiple rows might appear.
I do not recommend ever using natural join. A small change to an underlying table structure could break a query.

Related

Combine table data in MySQL using JOIN

I'm trying to join two tables in MySQL, in one I have a set of IDs (of the type GTEX-14BMU-1526-SM-5TDE6) and a set of type's of tissue(SMTS), I have to select the IDs for the tissue type 'Blood' (which is another column of the same table), and then I have to take only the first two strings (GTEX-14BMU) from the ID name and make a list of the different ones.
Then I have to compare this to a second table in which I have a list of IDs that already are in the type (GTEX-14BMU) which have to meet the condition that the column sex of this same table is 2.
The expected result is a list with the IDs which are sex type 2 and have tissue type 'Blood', meaning the ones that are coinciding. I'm trying to solve this by using JOIN and all the needed conditions in the same statement, which is:
mysql> SELECT DISTINCT SUBSTRING_INDEX(g.SAMPID,'-',2) AS sampid, m.SUBJID, g.SMTS, m.SEX
-> FROM GTEX_Sample AS g
-> JOIN GTEX_Pheno AS m ON sampid=m.SUBJID
-> WHERE m.SEX=2
-> AND g.SMTS='Blood';
But I'm either getting too many results from the combination of all possibilities or I'm getting an empty set. Is there any other way to do this?
Here:
JOIN GTEX_Pheno AS m ON sampid=m.SUBJID
I suspect that your intent is to refer to the substring_index() expression that is defined in the select clause (which is aliased sampid as well). In SQL, you can't reuse an alias defined in the select clause in the same scope (with a few exceptions, such as the ORDER BY clause, or the GROUP BY clause in MySQL). So the database thinks you are referring to column sampid of the sample table. If you had given a different alias (say sampid_short) and tried to use in the ON clause of the join, you would have met a compilation error.
You need to either repeat the expression, or use a subquery:
select substring_index(g.sampid, '-', 2) as sampid, m.subjid, g.smts, m.sex
from gtex_sample as g
inner join gtex_pheno as m on substring_index(g.sampid, '-', 2) = m.subjid
where m.sex = 2 and g.smts = 'blood';

Proper Term to Describe Output from Inner Join

Just keeping it simple here, so this is not specific to any computer language. But suppose I'm joining 2 tables, A and B. The relationship between the 2 tables is 1:many. If I use some type of code that performs an inner join between A and B, the result, let's call this table C, will yield a number of rows equal to the Cartesian product of A and B, given that we've filtered out what doesn't match between A and B on the criteria of the inner join.
In my work environment, people use the term "exploded" to describe the output shown in C, since the output table is typically much larger than A or B. I don't like this term "exploded." I was wondering if anyone had another term that more appropriately implies that table C is essentially table A enlarged with the addition of data from table B.

Nested query between two tables

I'm trying to write a query which combines data from two tables. Both tables contain information about certain objects. An object is uniquely defined by a triple (x,y,z) (each x, y and z have a separate column in T1) as well as by its name (the primary key in T1). In T2 the primary key is name2. There's also a column name1 which indicates the object with which object name2 is linked to, and a column w. So T1 contains the objects' names and their triples while T2 tells us which objects are linked ('linked' is not an equivalence, if object1 is linked to object2 then object2 is not linked to object 1).
We are given x, y and t and want the triples of the object that are linked to objects with x and y on their first coordinate with w>t.
I tried to write a query where we need matches object with a specific name:
SELECT
(SELECT x FROM T1 WHERE name=name2),
(SELECT y FROM T1 where name=name2) AS k,
w
FROM T2
WHERE name1=nn3
AND w>t
ORDER BY k;
but I can't get to how to write it when we may have more than one object from T1.
I don't have access to the database, I only know the columns of the tables so I'm having a hard time solving this without trying it out on the database. I'm confused by going from T1 to T2 back to T1 again.
From your question I read the table definitions to be:
T1 -- a list of objects
x,
y,
z,
name -- primary key, hence unique
T2 -- a list of objects linked to an object in T1
name2, -- primary key
name1, -- foreign key to T1.name
w
If we have three objects A, B and C all three will appear in T1. If B and C are linked to A, T2 will be (inter alia)
Name2 Name1
B A
C A
We are given x, y and t and want the triples of the object that are linked to objects with x and y on their first coordinate with w>t.
As you point out, T1 contains information about the objects and T2 about how objects are linked. To retrieve matching information from more than one table you use the JOIN syntax. Since there are two objects in a linkage you must have a separate JOIN for each. It is OK to reference a table more than once in a query. You should use aliases to clarify what role a table is fulfiling each time it is mentioned. This give something like
from T2 as linkage
inner join T1 as linked_from
on linked_from.name = T2.name1 -- note different columns in T2
inner join T1 as linked_to
on linked_to.name = T2.name2 -- note different columns in T2
Personally I don't like linked_from and linked_to as they are vague and generic. You should use whatever is specific and meaningful in the context of your problem.
You don't say whether the given x and y refer to the linked_from or the linked_to object. This will be very significant to the output of the query, but it is trivial to change in the SQL - just use the other alias - so I'll assume it is the parent object. The where clause is
where link_from.x = #x_given_value
and link_from.y = #y_given_value
and linkage.w > #t_given_value
You do not need to put additional sub-queries in the SELECT clause since you have already referenced all the tables and table-roles you need in the FROM cluase. So it becomes
SELECT
link_to.x,
link_to.y as k,
linkage.w
I'm confident you can work out the ordering.
Be aware of the cardinalities involved in your database i.e. how many rows match each condition. You state that x, y and z together are unique. Since only x and y are supplied you may well get multiple rows returned. Similarly there may well be many rows with a value of w greater than the given value of t, even for the same name1, name2 values. There will be many rows in T2 for each row in T1.
I realise this is an example and that you have probably "simplified" the names for the question, but any time you find yourself naming columns with a numeric suffix, you're doing something wrong. Maybe you haven't normalised sufficiently, or maybe you haven't understood the relationships between the data items.

Choosing none in set

I have two tables:
Invariant (UniqueID, characteristic1, characteristic2)
Variant (VariantID, UniqueID, specification1, specification2)
Each project has its own unchanging characteristics between implementations. Each implementation also has its own individual properties.
So, I use queries like this to find projects with the given characteristics and specifications:
SELECT *
FROM `Invariants`
LEFT JOIN (`Variants`) ON (`Variants`.`UniqueID`=`Invariants`.`UniqueID`)
WHERE char2='y' and spec1='x'
GROUP BY `Invariant`.`UniqueID`;
I'm looking for a query that will return all projects that have never satisfied a given specification. So, if one of project 100's variants had spec1='bad', then I don't want project 100 to be included, regardless if it had variants where spec1='good'.
select *
from Invariants iv
where not exists (
select 1
from Variants v
where v.UniqueId = iv.UniqueId and v.spec1 = 'bad'
)
The queries below do not address your question, I probably read to fast and thought you wanted to pick up only the invariant properties of a particular type. But I will note that you shouldn't use a left join and then filter, in the where clause, against columns from the right table (except for checking nulls). People make that mistake all the time and that's what jumped out to me at first glance.
The whole purpose of a left join is that some of the rows will not match and will thus have filler null values in the columns for the right-hand table. This join logic happens first and then after that the where clause is applied. When you have a condition like where spec1 = 'x' it will always evaluate to false against a null value. So you end up eliminating all the rows you wanted to keep.
This happens a lot with these invariant/custom values tables. You're only interested in one of the properties but if you don't filter prior to joining or inside the join condition, you end up dropping rows because the value didn't exist and you didn't have a value left to compare once it tried to apply a where-clause condition on the property name.
Hope that made sense. See below for examples:
select iv.UniqueId, ...
from
Invariants iv left outer join
Variants
on v.UniqueId = vi.UniqueId and v.spec1 = 'x'
or
select iv.UniqueId, ...
from
Invariants iv left outer join
(
select
from Variants
where spec1 = 'x'
) v
on v.UniqueId = vi.UniqueId

Selecting data from two tables without joining

Here's the deal:
Table A has columns A1 and A2
Table B has columns B1,B2,B3
Now i want to select data from columns A1 and B1 (without join), and the condition is:-
B3='someword' and A2=B2
If there is no need of printing B1, I would have written the query(without join) as:
select A1 from A where A2 in (select B2 from B where B3='someword');
But i need to print both A1 and B1, So is it possible to do that without using join and using 'IN'???
When you say you need to restrict the ouput to where A2=B2
YOU ARE SPECIFYING a JOIN.
calling it something else does not change what it was... To paraphrase Willie,
"A Join by any other name is still a Join"
Seriously, a "Join" is not the name, nor the word nor even the syntax in a query used to apply it, it is the logical predicate or restriction, or filter, that is based on values from two different tables. If you need to restrict the output to where Table1.A2 equals Table2,B2, then you have a Join
You cannot do this without JOIN effectively or in other words you can use inserted select maybe but this operation is much more slower like JOIN, JOIN is the best choice and programmatically and database written. Your potencial future client would not so happy when you would sell him slow IS.