Decomposing a relation into BCNF - relational-database

I'm having trouble establishing when a relation is in Boyce-Codd Normal Form and how to decompose it info BCNF if it is not. Given this example:
R(A, C, B, D, E) with functional dependencies: A -> B, C -> D
How do I go about decomposing it?
The steps I've taken are:
A+ = AB
C+ = CD
R1 = A+ = **AB**
R2 = ACDE (since elements of C+ still exist, continue decomposing)
R3 = C+ = **CD**
R4 = ACE (no FD closures reside in this relation)
So now I know that ACE will compose the whole relation, but the answer for the decomposition is: AB, CD, ACE.
I suppose I'm struggling with how to properly decompose a relation into BCNF form and how to tell when you're done. Would really appreciate anyone who can walk me through their thought process when solving these problems. Thanks!

Although the question is old, the other questions/answers don't seem to provide a very clear step-by-step general answer on determining and decomposing relations to BCNF.
1. Determine BCNF:
For relation R to be in BCNF, all the functional dependencies (FDs) that hold in R need to satisfy property that the determinants X are all superkeys of R. i.e. if X->Y holds in R, then X must be a superkey of R to be in BCNF.
In your case, it can be shown that the only candidate key (minimal superkey) is ACE.
Thus both FDs: A->B and C->D are violating BCNF as both A and C are not superkeys or R.
2. Decompose R into BCNF form:
If R is not in BCNF, we decompose R into a set of relations S that are in BCNF.
This can be accomplished with a very simple algorithm:
Initialize S = {R}
While S has a relation R' that is not in BCNF do:
Pick a FD: X->Y that holds in R' and violates BCNF
Add the relation XY to S
Update R' = R'-Y
Return S
In your case, the iterative steps are as follows:
S = {ABCDE} // Intialization S = {R}
S = {ACDE, AB} // Pick FD: A->B which violates BCNF
S = {ACE, AB, CD} // Pick FD: C->D which violates BCNF
// Return S as all relations are in BCNF
Thus, R(A,B,C,D,E) is decomposed into a set of relations: R1(A,C,E), R2(A,B) and R3(C,D) that satisfies BCNF.
Note also that in this case, functional dependency is preserved but normalization to BCNF does not guarantee this.

1NF -> 2NF -> 3NF -> BCNF
According to given FD set "ACE" forms the key.
Clearly R(A,B,C,D,E) is not in 2NF.
2NF decomposition gives R1(A,B) , R2(C,D) and R3(A,C,E).
this decomposition decomposed relations are in 3NF and also in BCNF.

Related

Different versions of 3NF?

I have a question on the definition of 3NF given by Chris Date in his book "Database Design and Relational Theory", page 78.
The definition given in the book is: "A relvar R is in 3NF iff for every non-trivial FD X -> Y, either X is a superkey, or Y is a subkey."
(For Date "Y is a subkey" means that Y is contained in a candidate key, and no assumption is made on the cardinality of the set Y in the Date definition.)
It seems to me, however, that this definition is not equivalent to the usual definition (that can be found in other references) saying that "R is in 3NF if for every FD X -> Y, either the FD is trivial, or X is a superkey, or every element in Y\X is contained in a candidate key".
Consider now the relvar with 5 attributes R(A,B,C,D,E) with the following FD cover:
{A,B} -> C,
{C,D} -> E,
E -> B
These imply {A,E} -> {B,C}. The candidate keys of R are K1 = {A,B,D}, K2 = {A,C,D} and K3 = {A,E,D} and so the FD {A,E} -> {B,C} shows that R is not in 3NF if we use Date's definition.
However, it is in 3NF if we use the "usual" definition (since every attribute is actually contained in a candidate key).
Is there something I do not understand? Or is Date really using another (stronger than the usual one) definition of 3NF?
The Date definition says "Y is a subkey means that Y is contained in a key, ...".
The 'usual definition' (where did you get that) says "or every element in Y\X is contained in a key".
Then they both say "Y ... contained in a key".
You can equivalently write {A,E} -> {B,C} as two FDs {A,E} -> {B}; {A,E} -> {C}. Now each Y\X is "contained in a [candidate] key".
So you seem to be quibbling about the wording "every element in Y", which the Date definition isn't explicit about(?) Or perhaps it is, and you haven't quoted Date in full?

DataBase - Is this relation R is in BCNF and dependency preserving?

For R(A,B,C,D,E,G,H) here's the minimal cover:
{A->E,D->H,D->G,E->C,G->B,G->C,H->D}
Candidate keys:
{AH,AD}
By the definition of BCNF, none of the attributes on left side are SK or CK. Thus, it's not in BCNF. Is it safe to conclude that all of the FDs are violating BCNF? If so, in the process of decomposition to BCNF, as the algorithm says, to take the FD that violates BCNF, for example: X->Y, and do the procedure of R1(XY) and R2(R-Y)
In our case, do I need to that do that all over the FDs? If I do so, I get in the end
R1(AE), R2(EC), R3(GB), R4(DH), R5(DG) and R6(AD)
But still missing G->C and H->D and R6 isn't in the FD from the start. So that doesn't make it dependency preserving?
Is it safe to conclude that all of the FDs are violating BCNF?
Yes
... and do the procedure of R1(XY) and R2(R-Y)
The standard analysis algorithm decompose the original schema in two subschemes R1(X+), R2(R - (X+ - X)). So if you start, for example, from AE, you produce R1(AEC) (since A+ = AEC), and R2(ABDGH). Then you repeat the steps in the remaining relations, if there are other dependencies that violates the BCNF.
For instance, in this case a decomposition that can be obtained is:
R4(AH)
R5(BG)
R6(DGH)
R7(CE)
R8(AE)
Note that, with this decomposition, the dependency G -> C is not preserved (it is a known fact that the algorithm can produce the loss of one or more dependencies).

Finding the strongest normal form and if it isn't in BCNF decompose it?

I know how to do these problems easily when the input is basic. I know the rules for the 1st,2nd,and 3rd normal forms as well as BCNF. HOWEVER I was looking through some practice exercises and I saw some unusual input:
Consider the following collection of relations and dependencies.
Assume that each relation is obtained through decomposition from a
relation with attributes ABCDEFGHI and that all the known
dependencies over relation ABCDEFGHI are listed for each question.
(The questions are independent of each other, obviously, since the given dependencies over ABCDEFGHI are different.)
R2(A,B,F) AC → E, B → F
R1(A,C,B,D,E) A → B, C → D
I can solve 2:
A+=AB
C+=CD
AC+=ABCD
ACE=ABCDE
So ACE is the candidate key, and none of A, C and E are superkeys. It isn't bcnf for sure. Decompose it and obtain (ACE)(AB)(CD) etc etc.
BUT Number 1 is confusing me! Why is there AC → E when neither C nor E is in R2? How could this be solved? It can't be an error because many other exercises are like this :/
Another question, what happens when one functional dependency is in BCNF and others are not? Do we just ignore this functional dependency while decomposing the others into BCNF?
If I understand correctly the text of the exercise, the dependencies are those holding on the original relation (ABCDEFGHI): “all the known dependencies over relation ABCDEFGHI are listed for each question”.
So, assuming that in the original relation the only specified dependencies are AC → E and B → F, this means that the dependency AC → E is lost in the decomposed relation R2(A,B,F), that the (only) candidate key of the relation is AB, the schema is not in 2NF (since F depends on a part of a key), and that to decompose that schema in BCNF you must decompose it in (AB) and (BF).

Decomposing relations to Fourth Normal Form

Disclosure: I am taking Stanford's online database course. The forum there is dead, and I'm hoping for some help on SO.
Here's the quiz question:
Consider relation R(A,B,C,D,E) with multivalued dependencies:
A -» B, B -» D
and no functional dependencies. Suppose we decompose R into 4th Normal Form. Depending on the order in which we deal with 4NF violations, we can get different final decompositions. Which one of the following relation schemas could be in the final 4NF decomposition?
And here is my thinking:
Since we are given that there are no functional dependencies, the only key is set of attributes (A,B,C,D,E). In other words, both multivalued dependencies in the question are violating, and we must decompose them.
I am following the decomposition algorithm given in lecture:
Compute keys for R [done]
Repeat until all relations are in 4NF
Pick any R' with nontrivial A -» B that violates 4NF
Decompose R' into R_1(A, B) and R_2(A, rest)
Compute functional dependencies and multivalued dependencies for R_1 and R_2
Compute keys for R_1 and R_2
I see two ways to decompose the relations: start with A -» B or B -» D.
Starting with A -» B
R(A,B,C,D,E)
|
+-----------+
| |
R_1(A,B) R_2(A,C,D,E)
Since B and D are no longer in the same relation, we do not have a 4NF violation, and we're done. I'm not sure how to compute the FDs, MVDs, and keys at this point.
Starting with B -» D
R(A,B,C,D,E)
|
+-----------+
| |
R_1(B,D) R_2(B,A,C,E)
|
+----------+
| |
R_3(A,B) R_4(A,C,E)
At this point, (A and B) and (B and D) are decomposed into their own relations, so we have no violations, and we're done.
The answer choices:
At this point, I'm completely stumped. I do not see any of the relations in the answer choices, nor can I come up with an idea that will get me there:
CE
AD
AE
ABD
I don't need the answer outright, but what am I missing?
A correct answer is AD.
How is this obtained?
Consider that, like for functional dependencies, you can have multivalued dependencies implied by other multivalued dependencies. For instance, there is a pseudo-transitivity rule (or multi-valued transitivity rules) that says:
If X →→ Y holds, and Y →→ Z holds, then X →→ Z − Y holds
For this rule, from A →→ B and B →→ D you can derive A →→ D. So, if you decompose the relation in 4NF you could start from this dependency, and get a table with attributes AD. Or, alternatively, in your first decomposition, after finding R_1(A,B) and R_2(A,C,D,E), you should continue to decompose R_2, since it still contain the non-trivial MVD A →→ D, to find R_3(A, D) and R_3(A, C, E).

How to prove 3NF?

I am trying really hard to spin my brain around how to prove 3NF.
I actually have the answer, but if someone know this well enough to make me understand it, I would be very grateful. Ok, here it goes:
If R is in 3NF according to Definition 2, R must be in 3NF according to Definition1.
Recall that if R is in 3NF according to Definition 1, then the following two conditions must be satisfied.
i) For R, we don’t have any transitive function dependency between a non-key
attribute and a key through some other non-key attribute.
ii) For R, we don’t have any partial function dependency between a non-key attribute
and a key.
Assume that R does not satisfy (i). Then, we must have a transitive FD: X → A, A → B, where X is a key, A and B are non-key attributes. But according to Definition 2, R does not have such kind of FDs. (That is, A must be a prime attribute or a super key.) Contradiction. So R must be satisfy (i).
Assume that R does not satisfy (ii). Then, we must have a subset S of a key X (S ⊂ X) such that there exists a non-key attribute A with S → A. However, according to Definition 2, S must be a super key. Contradiction. So R must satisfy (ii).
Therefore, R is in 3NF according to Definition 1.
If R is in 3NF according to Definition 1, R must be in 3NF according to Definition 2.

Assume that R is not in 3NF according to Definition 2. Then, we must have a FD: X → A such that X is not a super key and A is not a prime attribute. Consider the a key X’ of R. We must have
X’ →X→A.
It is a transitive FD between the non-key attribute A and the key X’ through X. If X is a non-key attribute, then R is not in 3NF according to Definition 1. Contradiction. If X appears in X’, we have a partial FD. So R is not in 2NF, contradicting to Definition 1.