Selecting from relation based on other relation - relational-database

Given a relation A(a,b,c), and a relation B(a,d,e), using projection to isolate 'a' in 'B' like so 'B_=projection_{a}(B)', is there a way to exclude all tupples in 'A', that does not have an 'a' in common with 'B'?
Note that i'm only using relational algebra, and not the extended version.

is there a way to exclude all tupples in 'A', that does not have an 'a' in common with 'B'?
That's a double negative: "exclude ... not ...". Let's turn it into a positive:
"Show all tuples in A that do have an a in common with B."
That is, you want a subset of the tuples. It's important here that a is the only attribute name in common between the two relations. Then here's a start with Natural Join.
A ⋈ B
That will produce a result with all attributes {a, b, c, d, e}. Not yet what you want, so you're on the right track with a projection. I'll use Codd's original operator (π). There's two ways; these are equivalent:
A ⋈ (π{a}( B )) // take just {a} from B
π{a, b, c}( A ⋈ B ) // take {a, b, c} from the result of Join
It's a commonly-needed operation, so there's also a shorthand amongst the "extended" set of operators, to avoid that projection, called (left) SemiJoin
A ⋉ B
Also called 'Matching'.

Related

Proper Term to Describe Output from Inner Join

Just keeping it simple here, so this is not specific to any computer language. But suppose I'm joining 2 tables, A and B. The relationship between the 2 tables is 1:many. If I use some type of code that performs an inner join between A and B, the result, let's call this table C, will yield a number of rows equal to the Cartesian product of A and B, given that we've filtered out what doesn't match between A and B on the criteria of the inner join.
In my work environment, people use the term "exploded" to describe the output shown in C, since the output table is typically much larger than A or B. I don't like this term "exploded." I was wondering if anyone had another term that more appropriately implies that table C is essentially table A enlarged with the addition of data from table B.

How to structure a database schema to allow for the "1 in a million" case?

Among all the tables in my database, I have two which currently have a Many-to-Many join. However, the actual data population being captured nearly always has a One-to-Many association.
Considering that I want database look-ups (doctrine queries) to be as unencumbered as possible, should I instead:
Create two associations between the tables (where the second is only
populated in these exceptional cases)?
Change the datatype for the association (eg to a text/tinyblob) to record a mini array of the 2 (or technically even 3) associated records?
This is what I currently have (although TableB-> JoinTable is usually just one-to-one):
TableA.id --< a_id.JoinTable.b_id >-- TableB.id
So, I am looking to see if I can capture the 'exceptions'. Is the below the correct way to go about it?
TableA.id TableB.id
+----< TableB.A_id1
+----- TableB.A_id2
+----- TableB.A_id3
You seem to be interested in:
-- a and b are related by the association of interest
Foo(a, b)
-- foo(a, b) but not foo(a2, b) for some a2 <> a
Boring(a, b)
unique(b)
FK (a, b) references Foo
-- foo(a, b) and foo(a2, b) for some a2 <> a
Rare(a, b)
FK (a, b) references foo
If you want queries to be unencumbered, just define Foo. You can query it for Rare.
Rare = select * from Foo f join Foo f2
where f.a <> f2.a and f.b = f2.b
Any other design suffers from update complexity in keeping the database consistent.
You have some fuzzy concern about Rare being much smaller than Foo. But what is your requirement re only n in a million Foo records being many:many by which you would choose some other design?
The next level of complexity is to have Foo and Rare. Updates have to keep the above equation true.
It seems extremely unlikely that there is a benefit in reducing the 2-or-3-in-a-million redundancy of Foo + Rare by only having Boring + Rare and reconstructing Foo from them. But it may be of benefit to define a unique index (b) for Boring which will maintain that a b in it has only one a. When you need Foo:
Foo = select * from Boring union select * from Rare
But your updates must maintain that
not exists (select * from Boring b join Rare r where b.b = r.b)
Change the datatype for the association (eg to a text/tinyblob) ?
Please don't do that. If you do the people maintaining your database will curse your name unto the thousandth generation. No joke.
Your best bet here is to rig a one-to-many association. Let's say your table a has an integer primary key a_id.
Then, put that a_id as a foreign key column in your second table b.
You can retrieve your information as follows. This will always give you one row in your result set for each row in a.
SELECT a.this, a.that, GROUP_CONCAT(b.value) value
FROM a
LEFT JOIN b ON a.a_id = b.a_id
GROUP BY a.this, a.that
If you don't mind the extra row for your one-in-a-million case it's even easier.
SELECT a.this, a.that, b.value
FROM a
LEFT JOIN b ON a.a_id = b.a_id
The LEFT JOIN operation allows for the case where your a row has no corresponding b row.
Put an index on b.a_id.

Natural join of table (A) with itself

I came across this question in an interview and I have been wondering if what I did was right. Let's say I have a table 'A' with the following attributes:
R S T
-----------
a1 b1 c1
a1 b2 c2
a1 b3 c3
a4 b4 c4
and lets say I need to calculate the Relational Algebra for given B = {[(projection)R,S (A) NATURAL JOIN (projection) S,T (A) ] NATURAL JOIN (projection)R,T (A)}
what would be the result?
This is what I tried:
-We know (A) NATURAL JOIN (A) = A
-I did the first set of join within the square bracket. Since we had attribute 'S' in common I just yielded the result to be a table of (R S T) with the same 4 rows of tuples.
-Finally, I joined (R S T) with the second set of join where attributes 'R' and 'T' are common which I assumed will yield R S T again with 4 rows of tuples.
Meaning, with the way I did it, I ended up getting B = A.
I did not consider the tuples at all, I just did a natural join based on the common attributes between two projections.
I know that's very stupid.. but I am trying to execute it in MySQL and for some reason I am getting errors when I try to execute such a query:
select A,B from dbt2.relationalalgebra as r1 NATURAL JOIN (select B, C from dbt2.relationalalgebra as r2); and I am getting an error saying every derived table must have its own aliases!
Please help me clarify on how Natural join works on same table.
Thanks in advance for any help.
What you did is correct. And it's correct that you obtained B = A -- given that content for A.
This is a question about the functional dependencies between the values in the data. (So you might not get B = A, if the data was different.)
For attributes S and T, there's a different value in each tuple. IoW given a value for S (or T), you know which row it's from, so you know the value for the other two attributes in that tuple. The functional dependencies are S -> R, T; T -> R, S. (You might say that S or T are each keys for A.)
The pairs of attributes in the projections you give each include at least one key, so uniquely determine which 'missing' attribute gets joined. You're seeing a lossless join decomposition, as per Heath's Theorem. http://en.wikipedia.org/wiki/Functional_dependency
A natural join is a shorthand for joining two tables (or subqueries) on all columns that have the same name.
A natural join of a table to itself could have several consequences. The most common would be the table itself -- if none of the values are NULL and the rows are unique. If each row had a NULL value in some column, then the natural join would return no rows at all. If rows are duplicated, then multiple rows might appear.
I do not recommend ever using natural join. A small change to an underlying table structure could break a query.

Will a key in sql still stay a key in a view

Let's say I have a mysql table called FISH with fields A, B and C.
I run SELECT * FROM FISH. This gets me a view with all fields. So, if A was a key in the original table, is it also a key in the view? Meaning, if I have a table FISH2, and I ran
SELECT * FROM (SELECT * FROM FISH) D, (SELECT * FROM FISH2) E WHERE D.A = E.A
Will the relevant fields still be keys?
Now, let's take this 1 step further. If I run
SELECT * FROM (SELECT CONCAT(A,B) AS DUCK, C FROM FISH) D, (SELECT CONCAT(A,B) AS DUCK2, C FROM FISH2) E WHERE D.DUCK = E.DUCK2
If A and B were keys in the original tables, will their concatenation also be a key?
Thanks :)
If A is a key in fish, any projection on fish only, will produce a resultset where A is still unique.
A join between table fish and any table with 1:1 relation (such as fish_type) will produce a result set where A is unique.
A join with another table that has 1:M or M:M relation from fish (such as fish_beits) will NOT produce a result where A is unique, unless you provide a filter predicate on the "other" side (such as bait='Dynamite').
SELECT * FROM (SELECT * FROM FISH) D, (SELECT * FROM FISH2) E WHERE D.A = E.A
...is logically equivalent to the following statement, and most databases (including MySQL) will perform the transformatiion:
select *
from fish
join fish2 on(fish.a = fish2.a)
Whether A is still unique in the resultset depends on the key of fish2 and their relation (see above).
Concatenation does not preserve uniqueness. Consider the following case:
concat("10", "10") => "1010"
concat("101", "0") => "1010"
Therefore, your final query...
SELECT *
FROM (SELECT CONCAT(A,B) AS DUCK, C FROM FISH) D
,(SELECT CONCAT(A,B) AS DUCK2, C FROM FISH2) E
WHERE D.DUCK = E.DUCK2
...won't (necessarily) produce the same result as
select *
from fish
join fish2 on(
fish.a = fish2.a
and fish.b = fish2.b
)
I wrote necessarily because the collisions depend on the actual values. I hunted down a bug about some time ago where the root cause was exactly this. The code had worked for several years before the bug manifested itself.
If by "key" you mean "unique", yes, tuples of a cartesian product over unique values will be unique.
(One can prove it via by reductio ad absurdum.)
For step 1, think of a view as a subquery containing everything in the AS clause when CREATE VIEW was executed.
For example, if view v is created as SELECT a, b, c FROM t, then when you execute...
SELECT * FROM v WHERE a = some_value
...it's conceptually treated as...
SELECT * FROM (SELECT a, b, c FROM t) WHERE a = some_value
Any database with a decent optimizer will notice that column a is passed straight into the results and that that it can take advantage of the indexing in t (if there is any) by moving it into the subquery:
SELECT * FROM (SELECT a, b, c FROM t WHERE a = some_value)
This all happens behind the scenes and is not an optimization you need to do yourself. Obviously, it can't do that for every condition in the WHERE clause, but understanding where you can is part of the art of writing a good optimizer.
For step 2, the concatenated keys will be part of intermediate results, and whether or not the database decides they need indexing is an implementation detail. Also note fche's comment about duplication.
If your database has a query plan explainer, running it and learning to interpret the results will give you a lot of insight about what makes your queries run fast and what slows them down.

MySQL Certification Guide Practice Qn - Views' Column Names

the question is
Which of the following methods for providing explicit names for the columns in a view work?
a. Include a column list
b. Provide column aliases in the view SELECT statement
c. Rename the columns when you select from the view
answer
a. Works: Include a column list
b. Works: Provide column aliases in the view SELECT statement
c. Does not work: Rename the columns when you select from the view
regarding (c) what do they mean by "Rename the columns when you select from the view"?
I think the question in the certification guide is worded poorly. You can give explicit names to columns when you select from a view, and this works:
CREATE VIEW MyView AS SELECT a, b, c FROM MyTable;
SELECT a AS d, b AS e, c AS f FROM MyView;
The problem is not with giving aliases to columns explicitly. Here's the problem: if you rely on this instead of defining the view with distinct column names, and the view consists of a join such that the column names are ambiguous, you run into trouble:
CREATE VIEW MyView AS
SELECT m.a, m.b, m.c, o.a, o.b, o.c
FROM MyTable m JOIN OtherTable o;
This is not a valid view, because in the view definition, all column names must be distinct. For instance, you would get ambiguous results when you query the view:
SELECT a FROM MyView;
Does this select the first a column or the second a column?
So you must have a distinct set of column names in the view definition, it's not enough to make them distinct as you query the view.
This is the reason I think the certification guide question was poorly worded. It's not about renaming columns explicitly, it's about ensuring that the columns of the view have distinct names. This is a common reason for renaming columns, so that's probably why the person writing the question wrote it that way.
Either of the other techniques mentioned in the question can resolve the ambiguity:
CREATE VIEW MyView (a, b, c, d, e, f) AS
SELECT m.a, m.b, m.c, o.a, o.b, o.c
FROM MyTable m JOIN OtherTable o;
or
CREATE VIEW MyView AS
SELECT m.a, m.b, m.c, o.a AS d, o.b AS e, o.c AS f
FROM MyTable m JOIN OtherTable o;
Either way, you get the aliased columns:
SELECT * FROM MyView; -- returns result with columns a, b, c, d, e, f
By "rename when you select" they surely mean something like SELECT a AS b FROM theview etc. The reason it doesn't work for the given task of "providing explicit names for the columns" is that there need not be an explicit, unambigous a in the view for you to "rename"... UNLESS you've already disambiguated by methods (a) or (b) [[in which case you may also "rename" this way, but that's pretty much a secondary issue!-)]].