Different versions of 3NF? - relational-database

I have a question on the definition of 3NF given by Chris Date in his book "Database Design and Relational Theory", page 78.
The definition given in the book is: "A relvar R is in 3NF iff for every non-trivial FD X -> Y, either X is a superkey, or Y is a subkey."
(For Date "Y is a subkey" means that Y is contained in a candidate key, and no assumption is made on the cardinality of the set Y in the Date definition.)
It seems to me, however, that this definition is not equivalent to the usual definition (that can be found in other references) saying that "R is in 3NF if for every FD X -> Y, either the FD is trivial, or X is a superkey, or every element in Y\X is contained in a candidate key".
Consider now the relvar with 5 attributes R(A,B,C,D,E) with the following FD cover:
{A,B} -> C,
{C,D} -> E,
E -> B
These imply {A,E} -> {B,C}. The candidate keys of R are K1 = {A,B,D}, K2 = {A,C,D} and K3 = {A,E,D} and so the FD {A,E} -> {B,C} shows that R is not in 3NF if we use Date's definition.
However, it is in 3NF if we use the "usual" definition (since every attribute is actually contained in a candidate key).
Is there something I do not understand? Or is Date really using another (stronger than the usual one) definition of 3NF?

The Date definition says "Y is a subkey means that Y is contained in a key, ...".
The 'usual definition' (where did you get that) says "or every element in Y\X is contained in a key".
Then they both say "Y ... contained in a key".
You can equivalently write {A,E} -> {B,C} as two FDs {A,E} -> {B}; {A,E} -> {C}. Now each Y\X is "contained in a [candidate] key".
So you seem to be quibbling about the wording "every element in Y", which the Date definition isn't explicit about(?) Or perhaps it is, and you haven't quoted Date in full?

Related

Avoiding trivial MVDs from non-trivial FDs?

I have two Functional Dependencies from one relation:
meetid -> pid
meetid, pid -> status
For relation Meetings(meetid,pid,status)
I want to use promotional approach to make multivalued dependencies from this. The problem is that I'm not sure if meetid ->> pid is legal to do in this situation, as the complementation rule will make the other MVD illegal (I think): meetid ->> status.
The other FD will create a dependency that I see as a trivial MVD.
meetid,pid ->> status
Is this relation doomed to be trivial when promoting them to MVDs, or am I missing something in the process?
The two dependencies:
meetid → pid
meetid, pid → status
are not minimal, since they can be simplified (for instance by computing a canonical cover) in:
meetid → pid
meetid → status
and from this we know also that meetid is the only candidate key of the relation. From a FD X → Y one can always derive the MVD X →→ Y, so, you have both the MVD dependencies:
meetid →→ pid
meeditd →→ status
(Note that the second one can also be derived from the first one by complementation).
Note also that none of them is a trivial MVD, since a MVD X →→ Y is trivial (that is, always true) if either Y is the empty set or XY are all the attributes of the relation.
Furthermore, we can note that the schema is in 4NF, since it is in BCNF and each left hand side of non-trivial MVDs is a superkey.
Finally, note that every functional dependency can be “promoted” to a multivalued dependency, so that we have also meetid, pid →→ status (and this is trivial).

Decomposing relations to Fourth Normal Form

Disclosure: I am taking Stanford's online database course. The forum there is dead, and I'm hoping for some help on SO.
Here's the quiz question:
Consider relation R(A,B,C,D,E) with multivalued dependencies:
A -» B, B -» D
and no functional dependencies. Suppose we decompose R into 4th Normal Form. Depending on the order in which we deal with 4NF violations, we can get different final decompositions. Which one of the following relation schemas could be in the final 4NF decomposition?
And here is my thinking:
Since we are given that there are no functional dependencies, the only key is set of attributes (A,B,C,D,E). In other words, both multivalued dependencies in the question are violating, and we must decompose them.
I am following the decomposition algorithm given in lecture:
Compute keys for R [done]
Repeat until all relations are in 4NF
Pick any R' with nontrivial A -» B that violates 4NF
Decompose R' into R_1(A, B) and R_2(A, rest)
Compute functional dependencies and multivalued dependencies for R_1 and R_2
Compute keys for R_1 and R_2
I see two ways to decompose the relations: start with A -» B or B -» D.
Starting with A -» B
R(A,B,C,D,E)
|
+-----------+
| |
R_1(A,B) R_2(A,C,D,E)
Since B and D are no longer in the same relation, we do not have a 4NF violation, and we're done. I'm not sure how to compute the FDs, MVDs, and keys at this point.
Starting with B -» D
R(A,B,C,D,E)
|
+-----------+
| |
R_1(B,D) R_2(B,A,C,E)
|
+----------+
| |
R_3(A,B) R_4(A,C,E)
At this point, (A and B) and (B and D) are decomposed into their own relations, so we have no violations, and we're done.
The answer choices:
At this point, I'm completely stumped. I do not see any of the relations in the answer choices, nor can I come up with an idea that will get me there:
CE
AD
AE
ABD
I don't need the answer outright, but what am I missing?
A correct answer is AD.
How is this obtained?
Consider that, like for functional dependencies, you can have multivalued dependencies implied by other multivalued dependencies. For instance, there is a pseudo-transitivity rule (or multi-valued transitivity rules) that says:
If X →→ Y holds, and Y →→ Z holds, then X →→ Z − Y holds
For this rule, from A →→ B and B →→ D you can derive A →→ D. So, if you decompose the relation in 4NF you could start from this dependency, and get a table with attributes AD. Or, alternatively, in your first decomposition, after finding R_1(A,B) and R_2(A,C,D,E), you should continue to decompose R_2, since it still contain the non-trivial MVD A →→ D, to find R_3(A, D) and R_3(A, C, E).

How to prove 3NF?

I am trying really hard to spin my brain around how to prove 3NF.
I actually have the answer, but if someone know this well enough to make me understand it, I would be very grateful. Ok, here it goes:
If R is in 3NF according to Definition 2, R must be in 3NF according to Definition1.
Recall that if R is in 3NF according to Definition 1, then the following two conditions must be satisfied.
i) For R, we don’t have any transitive function dependency between a non-key
attribute and a key through some other non-key attribute.
ii) For R, we don’t have any partial function dependency between a non-key attribute
and a key.
Assume that R does not satisfy (i). Then, we must have a transitive FD: X → A, A → B, where X is a key, A and B are non-key attributes. But according to Definition 2, R does not have such kind of FDs. (That is, A must be a prime attribute or a super key.) Contradiction. So R must be satisfy (i).
Assume that R does not satisfy (ii). Then, we must have a subset S of a key X (S ⊂ X) such that there exists a non-key attribute A with S → A. However, according to Definition 2, S must be a super key. Contradiction. So R must satisfy (ii).
Therefore, R is in 3NF according to Definition 1.
If R is in 3NF according to Definition 1, R must be in 3NF according to Definition 2.

Assume that R is not in 3NF according to Definition 2. Then, we must have a FD: X → A such that X is not a super key and A is not a prime attribute. Consider the a key X’ of R. We must have
X’ →X→A.
It is a transitive FD between the non-key attribute A and the key X’ through X. If X is a non-key attribute, then R is not in 3NF according to Definition 1. Contradiction. If X appears in X’, we have a partial FD. So R is not in 2NF, contradicting to Definition 1.

Decomposing a relation into BCNF

I'm having trouble establishing when a relation is in Boyce-Codd Normal Form and how to decompose it info BCNF if it is not. Given this example:
R(A, C, B, D, E) with functional dependencies: A -> B, C -> D
How do I go about decomposing it?
The steps I've taken are:
A+ = AB
C+ = CD
R1 = A+ = **AB**
R2 = ACDE (since elements of C+ still exist, continue decomposing)
R3 = C+ = **CD**
R4 = ACE (no FD closures reside in this relation)
So now I know that ACE will compose the whole relation, but the answer for the decomposition is: AB, CD, ACE.
I suppose I'm struggling with how to properly decompose a relation into BCNF form and how to tell when you're done. Would really appreciate anyone who can walk me through their thought process when solving these problems. Thanks!
Although the question is old, the other questions/answers don't seem to provide a very clear step-by-step general answer on determining and decomposing relations to BCNF.
1. Determine BCNF:
For relation R to be in BCNF, all the functional dependencies (FDs) that hold in R need to satisfy property that the determinants X are all superkeys of R. i.e. if X->Y holds in R, then X must be a superkey of R to be in BCNF.
In your case, it can be shown that the only candidate key (minimal superkey) is ACE.
Thus both FDs: A->B and C->D are violating BCNF as both A and C are not superkeys or R.
2. Decompose R into BCNF form:
If R is not in BCNF, we decompose R into a set of relations S that are in BCNF.
This can be accomplished with a very simple algorithm:
Initialize S = {R}
While S has a relation R' that is not in BCNF do:
Pick a FD: X->Y that holds in R' and violates BCNF
Add the relation XY to S
Update R' = R'-Y
Return S
In your case, the iterative steps are as follows:
S = {ABCDE} // Intialization S = {R}
S = {ACDE, AB} // Pick FD: A->B which violates BCNF
S = {ACE, AB, CD} // Pick FD: C->D which violates BCNF
// Return S as all relations are in BCNF
Thus, R(A,B,C,D,E) is decomposed into a set of relations: R1(A,C,E), R2(A,B) and R3(C,D) that satisfies BCNF.
Note also that in this case, functional dependency is preserved but normalization to BCNF does not guarantee this.
1NF -> 2NF -> 3NF -> BCNF
According to given FD set "ACE" forms the key.
Clearly R(A,B,C,D,E) is not in 2NF.
2NF decomposition gives R1(A,B) , R2(C,D) and R3(A,C,E).
this decomposition decomposed relations are in 3NF and also in BCNF.

Did I answer this correctly?

yesterday I took a database exam and the question about normalization was strange.
We had table R(ABCDEFG) and functional dependencies G->B, C->DG, CF->E, F-A. Which are the candidate keys for R? I only found one: CF. Then R1(DFG), which are the candidate keys for R1? I only found one: DFG. State a correct 3NF normalization for R. I stated ((C,F), E), ((G, B)), ((F), A), ((C), D)
and then the functional dependency GDF->C was added. What is now a correct 3NF normalization of R? I said ((G, D, F, C)), ((G), B), ((F, ); A), ((C), D), ((C, F), E)
Did I solve it correct?
Then even more strange, we should state what is what when the following are listed:
Product ID
Order number
Customer ID
Quantity
Customer name
Product name
Date
I concluded
G= Product ID
C= Order number
F= Customer ID
D= Quantity
A= Customer name
B= Product name
E= Date
Is this correct? What does the FD GDF->C mean in plain English?
"Yesterday I took a database exam and the question about normalization
was strange. We had table R(ABCDEFG) and functional dependencies G->B,
C->DG, CF->E, F-A. Which are the candidate keys for R? I only found
one: CF.
That seems OK.
Then R1(DFG), which are the candidate keys for R1? I only found one:
DFG.
With the very same set of FD's ??? With no FD's at all ??? Anyway, this one seems correct too.
State a correct 3NF normalization for R. I stated ((C,F), E), ((G,
B)), ((F), A), ((C), D)
((G), B) instead of ((G, B)) would be more like it.
((C), DG) instead of ((C), D) would be more like it too.
and then the functional dependency GDF->C was added. What is now a
correct 3NF normalization of R? I said ((G, D, F, C)), ((G), B), ((F,
); A), ((C), D), ((C, F), E)
Addition of this FD (/constraint) doesn't alter the 3NF form. All dependencies that are expressible in the 3NF design are still "out of complete keys". The fact that this additional dependency could not be preserved by the decomposition, does not lower the normal form. It's a dependency preservation issue, not a normal form issue.
Did I solve it correct?
Best option is to ask the teachers.
"Then even more strange, we should state what is what when the
following are listed:"
The folly. The question itself forces you to make assumptions. Date. What date is that ? Date of birth of the customer placing the order ? Date when the ordered product was assigned its current name ? Or perhaps Date when the order was placed ? Presumably so, but the thing is, this should be clearly spelled out in the specs and database designers should really be taught NEVER TO ASSUME ANYTHING ABOUT THE SPECS. Assumption is the mother of all screwups.
What does the FD GDF->C mean in plain English?
In plain English, and assuming your answer, it means that once a certain combination of {customer id, product id and quantity} has been used in an order, there can no longer appear a second order (with a different order id) with the very same {customer id, product id and quantity}. Or, iow : each customer can order a certain specific quantity of a certain specific product only once.