Functional dependency inferences with redundant attribute - relational-database

Given 1) CA -> B and 2) B -> C can you infer A -> B using
Armstrong's
axioms?
I tried to use inference rules to prove this, but get stuck.
BA -> CA Augmentation of A
BA -> CA and CA -> B Transitive property
AB -> B
It seems to makes sense to be able to drop the B as it is redundant? Is that an axiom that can be proved using the fundamental inference rules?
Is this problem even possible?

You seem to misunderstand the basics. You are proving that FDs hold when other ones do, you are not proving axioms. Armstrong's (so-called) "axioms" are the "fundamental inference rules" for FDs. Read your textbook. It says that the axioms are "complete" & says that that means that if you keep applying one that you haven't applied since you added an FD until you don't add any FDs then you will get all the FDs that hold.
So just do that & see whether yours is added/implied. Inspiration and/or luck might shorten things.
And--one counterexample disproves a claim. So generate some small example relations with attributes A, B & C where CA -> B & B -> C and try to generate one where it is not the case that A -> B. Again, inspiration and/or luck might shorten things.

A -> B cannot be inferred as the following counterexample shows:
Assume C is a nonempty set of attributes. And B = C (hence B is also a nonempty set of attributes and each member of B is also member of C and vice versa).
By the reflexivity axiom it is obvious, that C -> C holds. And since B equals C, also B -> C holds.
Assume A is an empty set of attributes. Then clearly no nonempty set of attributes can be inferred from A. (One can't infer a customer's address from nothing).
CA denotes the union of C and A. Since A is empty, CA equals C.
Hence CA -> C is the same as C -> C which is true by the reflexivity axiom. And since B equals C also CA -> B holds.
For those who don't think that empty sets are valid sets a more explicit example:
C = { street, telephone_number }, B = C, A = { street }.
CA still equals C, hence CA -> B holds, just as B -> C. But clearly it'll be hard to infer a telephone number if you only know the street, hence A -> B doesn't hold.
Since there are people who are convinced that the telephone number of a person functionally depends on the street that person lives in (which, if true, would indeed break my counterexample), I'll elaborate a little on that.
I live in a village that has many (i.e. more than one) streets. Each of those streets have buildings which are identified by a number. Each of those buildings have one or more apartments (like 3rd floor left) and in each of those apartments there live 1 or more persons. All those persons have a telephone. Some persons share a telephone, others won't.
A relation that describes the people in my village has the attributes street, building_number, apartment_number, telephone_number among others (name of the person, date of birth, ...).
Since a single value of street maps to many persons in many cases (the cases in which there's only a single building with a single apartment where only a single person lives are rare), this means that for each street there's a long list of telephone numbers (where "long" means "more than one").
Even if the full address is used (street, building number, apartment number) it is still possible to find several telephone numbers since potentially more than one person lives at that address.
And because some people share a phone, the telephone number is not a key for persons either. And since it isn't excluded that persons that share a telephone live at different addresses, there's not even a functional dependency between telephone number and street.
By the above there is no functional dependency between { street } and
{ telephone_number }, which means there isn't a functional dependency between { street } and { street, telephone_number } either.
Hence A -> B doesn't hold.

Related

I don't understand the answers of this question with natural join and projection

I am following a MOOC, but I don't understand the correct answer nor the other answers.
The MOOC closed and I cannot ask any questions on the forum.
This is the question:
Considering the following relation R:
A B C D
1 0 2 2
4 1 2 2
6 0 6 3
7 1 2 3
1 0 6 1
1 1 2 1
Between all these requests, which one return the same relation R?
ΠA,B,C,D(R⋈δA→D,D→F(R))
R⋈δA→D,D→A(R)
R⋈δB→C,C→B(R)
ΠA,B,C,D(R⋈δB→G,C→F(R)) (note: this is the correct answer)
The only given explanation is :
The first 3 answers loose the tuple(4,1,2,2). In the last joint, no tuple is lost.
Could you details please whats does the answers do?
Thank you very much for your attention!
This is a question about the Relational Algebra's Natural Join, and attribute naming. I presume the squiggly thing in your formulas is for Rename, usually denoted by Greek letter rho ρ (see the wikipedia link).
For Natural Join see the wikipedia example and note
The result of the natural join is the set of all combinations of tuples in R and S that are equal on their common attribute names.
Because of the renaming in the four formulas, in general, the result from renamed R will not have the same attribute names as the original R, or will not be equal on the values in the resulting same-named attributes.
I suggest you go through each four of the renamings, and work out what is the 'heading' of each result -- that is, what are the resulting attribute names.
You'll find in requests 1., 2., 3. there's at least one resulting attribute same-named as the original R but the values for that attribute are not the same.
In request 4., although attributes B, C are renamed, their new names do not clash with any existing attribute in R. So the Natural Join to original R will use attributes A, D. This'll produce an interesting intermediate result: consider the tuples <1, 0, 6, 1>, <1, 1, 2, 1> which each contain equal values in their A attribute and their D attribute.
But then in request 4., the projection will throw away the newly-named attributes G, F and collapse back to the original A, B, C, D. So in general, request 4. always returns exactly the original R.
Requests 1., 2., 3. might sometimes return the original R, depending on the content of R. But with the content you show, there are clashes of newly-same-named attributes with non-equal values, so they do 'lose' tuples.
BTW, although tuple <4, 1, 2, 2> does indeed get 'lost' in those three requests, it's not the only tuple that gets 'lost'. In particular in request 3., note that for the sample data, there are no values in common between B, C, so swapping them round in the rename has the effect of returning an empty result from the Join.

Decomposing relations to Fourth Normal Form

Disclosure: I am taking Stanford's online database course. The forum there is dead, and I'm hoping for some help on SO.
Here's the quiz question:
Consider relation R(A,B,C,D,E) with multivalued dependencies:
A -» B, B -» D
and no functional dependencies. Suppose we decompose R into 4th Normal Form. Depending on the order in which we deal with 4NF violations, we can get different final decompositions. Which one of the following relation schemas could be in the final 4NF decomposition?
And here is my thinking:
Since we are given that there are no functional dependencies, the only key is set of attributes (A,B,C,D,E). In other words, both multivalued dependencies in the question are violating, and we must decompose them.
I am following the decomposition algorithm given in lecture:
Compute keys for R [done]
Repeat until all relations are in 4NF
Pick any R' with nontrivial A -» B that violates 4NF
Decompose R' into R_1(A, B) and R_2(A, rest)
Compute functional dependencies and multivalued dependencies for R_1 and R_2
Compute keys for R_1 and R_2
I see two ways to decompose the relations: start with A -» B or B -» D.
Starting with A -» B
R(A,B,C,D,E)
|
+-----------+
| |
R_1(A,B) R_2(A,C,D,E)
Since B and D are no longer in the same relation, we do not have a 4NF violation, and we're done. I'm not sure how to compute the FDs, MVDs, and keys at this point.
Starting with B -» D
R(A,B,C,D,E)
|
+-----------+
| |
R_1(B,D) R_2(B,A,C,E)
|
+----------+
| |
R_3(A,B) R_4(A,C,E)
At this point, (A and B) and (B and D) are decomposed into their own relations, so we have no violations, and we're done.
The answer choices:
At this point, I'm completely stumped. I do not see any of the relations in the answer choices, nor can I come up with an idea that will get me there:
CE
AD
AE
ABD
I don't need the answer outright, but what am I missing?
A correct answer is AD.
How is this obtained?
Consider that, like for functional dependencies, you can have multivalued dependencies implied by other multivalued dependencies. For instance, there is a pseudo-transitivity rule (or multi-valued transitivity rules) that says:
If X →→ Y holds, and Y →→ Z holds, then X →→ Z − Y holds
For this rule, from A →→ B and B →→ D you can derive A →→ D. So, if you decompose the relation in 4NF you could start from this dependency, and get a table with attributes AD. Or, alternatively, in your first decomposition, after finding R_1(A,B) and R_2(A,C,D,E), you should continue to decompose R_2, since it still contain the non-trivial MVD A →→ D, to find R_3(A, D) and R_3(A, C, E).

Finding the Candidate Keys of a Relation Using the FD's

I have definitely checked out many different related posts, as suggested when creating this question. I have also done different sample problems from online sources as well from a similar problem. However, I am stuck on the problem below specifically.
Given the following relation R and the set of functional dependencies S that hold on R, find all candidate keys for R. Show your work.
R(A, B, C, D, E, F)
S:
AB → C
AC → B
AD → E
BC → A
E → F
Initially, I broke the attributes into groups: attributes found only on the left, only on the right, and on both sides (they are D, ABCE, and F respectively). I also know that I should try to compute the closure of D. This is where I get stuck. At first glance, this seems like I am unable to solve this problem, which isn't true. I also tried computing the closures of (AD), (BD), (CD), and (ED) because I thought that the closure of D = D. Any thoughts?
The keys here are ABD, ACD and BCD.
You were on the right track. After dividing the attributes in three groups, the attributes under "only on the left" list are always a part of the key. Here that attribute is D.
"I also tried computing the closures of (AD), (BD), (CD), and (ED)"
As you couldn't determine the key while taking attributes in groups of 2 you should have then tried making group of 3 attributes and check their closure.

Did I answer this correctly?

yesterday I took a database exam and the question about normalization was strange.
We had table R(ABCDEFG) and functional dependencies G->B, C->DG, CF->E, F-A. Which are the candidate keys for R? I only found one: CF. Then R1(DFG), which are the candidate keys for R1? I only found one: DFG. State a correct 3NF normalization for R. I stated ((C,F), E), ((G, B)), ((F), A), ((C), D)
and then the functional dependency GDF->C was added. What is now a correct 3NF normalization of R? I said ((G, D, F, C)), ((G), B), ((F, ); A), ((C), D), ((C, F), E)
Did I solve it correct?
Then even more strange, we should state what is what when the following are listed:
Product ID
Order number
Customer ID
Quantity
Customer name
Product name
Date
I concluded
G= Product ID
C= Order number
F= Customer ID
D= Quantity
A= Customer name
B= Product name
E= Date
Is this correct? What does the FD GDF->C mean in plain English?
"Yesterday I took a database exam and the question about normalization
was strange. We had table R(ABCDEFG) and functional dependencies G->B,
C->DG, CF->E, F-A. Which are the candidate keys for R? I only found
one: CF.
That seems OK.
Then R1(DFG), which are the candidate keys for R1? I only found one:
DFG.
With the very same set of FD's ??? With no FD's at all ??? Anyway, this one seems correct too.
State a correct 3NF normalization for R. I stated ((C,F), E), ((G,
B)), ((F), A), ((C), D)
((G), B) instead of ((G, B)) would be more like it.
((C), DG) instead of ((C), D) would be more like it too.
and then the functional dependency GDF->C was added. What is now a
correct 3NF normalization of R? I said ((G, D, F, C)), ((G), B), ((F,
); A), ((C), D), ((C, F), E)
Addition of this FD (/constraint) doesn't alter the 3NF form. All dependencies that are expressible in the 3NF design are still "out of complete keys". The fact that this additional dependency could not be preserved by the decomposition, does not lower the normal form. It's a dependency preservation issue, not a normal form issue.
Did I solve it correct?
Best option is to ask the teachers.
"Then even more strange, we should state what is what when the
following are listed:"
The folly. The question itself forces you to make assumptions. Date. What date is that ? Date of birth of the customer placing the order ? Date when the ordered product was assigned its current name ? Or perhaps Date when the order was placed ? Presumably so, but the thing is, this should be clearly spelled out in the specs and database designers should really be taught NEVER TO ASSUME ANYTHING ABOUT THE SPECS. Assumption is the mother of all screwups.
What does the FD GDF->C mean in plain English?
In plain English, and assuming your answer, it means that once a certain combination of {customer id, product id and quantity} has been used in an order, there can no longer appear a second order (with a different order id) with the very same {customer id, product id and quantity}. Or, iow : each customer can order a certain specific quantity of a certain specific product only once.

What are Boolean Networks?

I was reading this SO question and I got intrigued by what are Boolean Networks, I've looked it up in Wikipedia but the explanation is too vague IMO. Can anyone explain me what Boolean Networks are? And if possible with some examples too?
Boolean networks represent a class of networks where the nodes have states and the edges represent transitions between states. In the simplest case, these states are either 1 or 0 – i.e. boolean.
Transitions may be simple activations or inactivations. For example, consider nodes a and b with an edge from a to b.
f
a ------> b
Here, f is a transition function. In the case of activation, f may be defined as:
f(x) = x
i.e. b's value is 1 if and only if a's value is 1. Conversely, an inactivation (or repression) might look like this:
f(x) = NOT x
More complex networks use more involved boolean functions. E.g. consider:
a b
\ /
\ /
\ /
v
c
Here, we've got edges from a to c and from b to c. c might be defined in terms of a and b as follows.
f(a, b) = a AND NOT b
Thus, c is activated only if a is active and b is inactive, at the same time.
Such networks can be used to model all kinds of relations. One that I know of is in systems biology where they are used to model (huge) interaction networks of chemicals in living cells. These networks effectively model how certain aspects of the cells work and they can be used to find deficiencies, points of attack for drugs and similarities between unrelated components that point to functional equivalence. This is fundamentally important in understanding how life works.