How can I determine the candidate keys in this relation - relational-database

I have the relation R(ABCDEF) and the functional dependencies F{AC->B, BD->F, F->CE}
I have to find all the candidate keys for the relation(Armstrong axioms).
I did this:
A->A, B->B, C->C, D->D, E->E, F->F
From F->CE => F->C and F->E
And then:
1. BD->F
2. F->E
3. BD->E
4. BD->EF
5. BD->BD
6. BD->BDEF
7. BD->F
8. F->CEF
9. BD->CEF => BD->BCDEF
Now I am trying to get A on the right hand side of BD->BCDEF so BD can become a candidate key.
It would be great if someone could help.
EDIT:
1. ABD->ABCDEF
2. ACD->BD
3. ACD->ABD => AC->B and ACD->ABCDEF => BD->ABCDEF

The last step in your (edited) logic is
AC->B and ACD->ABCDEF, therefore BD->ABCDEF
It looks like you've replaced AC with B on the left-hand side. You seem to be thinking in arithmetical terms, not in terms of Armstrong's rules of inference. There isn't a rule of inference that says "if AC->B, then wherever AC appears, you can replace AC with B". (Sometimes it looks like that's what happens, but it's not.) AC and B aren't equal, and they're not equivalent.
Imagine that people's names are unique. Then "name" would determine "height", and "name" would determine "weight". But you can't replace name with height; you can't say that "height" determines "weight". The terms aren't equal, and they're not equivalent.
BD is not a candidate key, but ABD is. (There are others.)

Rules of Thumb:
An Attribute appearing only on the left hand side in your FDs is in all keys.
An Attribute not appearing in any of your FDs is in all keys.
An Attribute only appearing on the right hand side in your FDs is not in any key.
A candidate key is the left hand side of a derived FD on which all attributes depend.
Example:
R(ABCDE), (A->C, AB->D, D->B)
E does not appear in any FDs. E is in all Keys.
A appears only on the left hand side. A is in all keys.
C appears only on the right hand side. C is not in any keys.
Keys will include the attributes:
AE
Find dependencies with every possible key from AE:
A->C
A->AC (X->XY axiom)
E->E
AE->ACE (from previous 2 FDs)
Not all attributes are on the right, therefore AE is not a key, only part of all keys.
Start combining AE with BCD and see what comes out:
ADE->ABCDE (as D->B, and by X->XY axiom D->BD. This is a key, by last rule of thumb)
ACE->ACE
ABE->ABCDE (AE->ACE, B->BD from axioms, this is a key)
ABCE->ABCDE, ABDE->ABCDE (superkeys of ABE, so ignore)
ACDE->ABDCE (superkey of ADE)
Assuming I've done this correctly, Then ABE and ADE are keys.

Related

Finding the strongest normal form and if it isn't in BCNF decompose it?

I know how to do these problems easily when the input is basic. I know the rules for the 1st,2nd,and 3rd normal forms as well as BCNF. HOWEVER I was looking through some practice exercises and I saw some unusual input:
Consider the following collection of relations and dependencies.
Assume that each relation is obtained through decomposition from a
relation with attributes ABCDEFGHI and that all the known
dependencies over relation ABCDEFGHI are listed for each question.
(The questions are independent of each other, obviously, since the given dependencies over ABCDEFGHI are different.)
R2(A,B,F) AC → E, B → F
R1(A,C,B,D,E) A → B, C → D
I can solve 2:
A+=AB
C+=CD
AC+=ABCD
ACE=ABCDE
So ACE is the candidate key, and none of A, C and E are superkeys. It isn't bcnf for sure. Decompose it and obtain (ACE)(AB)(CD) etc etc.
BUT Number 1 is confusing me! Why is there AC → E when neither C nor E is in R2? How could this be solved? It can't be an error because many other exercises are like this :/
Another question, what happens when one functional dependency is in BCNF and others are not? Do we just ignore this functional dependency while decomposing the others into BCNF?
If I understand correctly the text of the exercise, the dependencies are those holding on the original relation (ABCDEFGHI): “all the known dependencies over relation ABCDEFGHI are listed for each question”.
So, assuming that in the original relation the only specified dependencies are AC → E and B → F, this means that the dependency AC → E is lost in the decomposed relation R2(A,B,F), that the (only) candidate key of the relation is AB, the schema is not in 2NF (since F depends on a part of a key), and that to decompose that schema in BCNF you must decompose it in (AB) and (BF).

Finding the Candidate Keys of a Relation Using the FD's

I have definitely checked out many different related posts, as suggested when creating this question. I have also done different sample problems from online sources as well from a similar problem. However, I am stuck on the problem below specifically.
Given the following relation R and the set of functional dependencies S that hold on R, find all candidate keys for R. Show your work.
R(A, B, C, D, E, F)
S:
AB → C
AC → B
AD → E
BC → A
E → F
Initially, I broke the attributes into groups: attributes found only on the left, only on the right, and on both sides (they are D, ABCE, and F respectively). I also know that I should try to compute the closure of D. This is where I get stuck. At first glance, this seems like I am unable to solve this problem, which isn't true. I also tried computing the closures of (AD), (BD), (CD), and (ED) because I thought that the closure of D = D. Any thoughts?
The keys here are ABD, ACD and BCD.
You were on the right track. After dividing the attributes in three groups, the attributes under "only on the left" list are always a part of the key. Here that attribute is D.
"I also tried computing the closures of (AD), (BD), (CD), and (ED)"
As you couldn't determine the key while taking attributes in groups of 2 you should have then tried making group of 3 attributes and check their closure.

how to find the highest normal form for a given relation

I've gone through internet and books and still have some difficulties on how to determine the normal form of this relation
R(a, b, c, d, e, f, g, h, i)
FDs =
B→G
BI→CD
EH→AG
G→DE
So far I've got that the only candidate key is BHI (If I should count with F, then BFHI).
Since the attribute F is not in use at all. Totally independent from the given FDs.
What am I supposed to do with the attribute F then?
How to determine the highest normal form for the realation R?
What am I supposed to do with the attribute F then?
You could observe the fact that the only FD in which F gets mentioned, is the trivial one F->F. It's not explicitly mentioned precisely because it is trivial. Nonetheless, all of Armstrong's axioms apply to trivial ones equally well. So, you can use this trivial one, e.g. applying augmentation, to go from B->G to BF->GF;
How to determine the highest normal form for the relation R?
first, test the condition of first normal form. If satisfied, NF is at least 1. Check the condition of second normal form. If satisfied, NF is at least 2. Check the condition of third normal form. If satisfied, NF is at least three.
Note :
"checking the condition of first normal form", is a bit of a weird thing to do in a formal process, because there exists no such thing as a formal definition of that condition, unless you go by Date's, but I have little doubt that your course does not follow that definition.
Hint :
Given that the sole key is BFHI, which is the first clause of "the key, the whole key, and nothing but the key" that gets violated by, say, B->G ?

Boolean Implication

I need some help with this Boolean Implication.
Can someone explain how this works in simple terms:
A implies B = B + A' (if A then B). Also equivalent to A >= B
Boolean implication A implies B simply means "if A is true, then B must be true". This implies (pun intended) that if A isn't true, then B can be anything. Thus:
False implies False -> True
False implies True -> True
True implies False -> False
True implies True -> True
This can also be read as (not A) or B - i.e. "either A is false, or B must be true".
Here's how I think about it:
if(A)
return B;
else
return True;
if A is true, then b is relevant and should be checked, otherwise, ignore B and return true.
I think I see where Serge is coming from, and I'll try to explain the difference. This is too long for a comment, so I'll post it as an answer.
Serge seems to be approaching this from the perspective of questioning whether or not the implication applies. This is somewhat like a scientist trying to determine the relationship between two events. Consider the following story:
A scientist visits four different countries on four different days. In each country she wants to determine if rain implies that people will use umbrellas. She generates the following truth table:
Did it rain? Did people Does rain => umbrellas? Comment
use umbrellas?
No No ?? It didn't rain, so I didn't get to observe
No Yes ?? People were shielding themselves from the hot sun; I don't know what they would do in the rain
Yes No No Perhaps the local government banned umbrellas and nobody can use them. There is definitely no implication here.
Yes Yes ?? Perhaps these people use umbrellas no matter what weather it is
In the above, the scientist doesn't know the relationship between rain and umbrellas and she is trying to determine what it is. Only on one of the days in one of the countries can she definitively say that implies is not the correct relationship.
Similarly, it seems that Serge is trying to test whether A=>B, and is only able to determine it in one case.
However, when we are evaluating boolean logic we know the relationship ahead of time, and want to test whether the relationship was adhered to. Another story:
A mother tells her son, "If you get dirty, take a bath" (dirty=>bath). On four separate days, when the mother comes home from work, she checks to see if the rule was followed. She generates the following truth table:
Get dirty? Take a bath? Follow rule? Comment
No No Yes Son didn't get dirty, so didn't need to take a bath. Give him a cookie.
No Yes Yes Son didn't need to take a bath, but wanted to anyway. Extra clean! Give him a cookie.
Yes No No Son didn't follow the rule. No cookie and no TV tonight.
Yes Yes Yes He took a bath to clean up after getting dirty. Give him a cookie.
The mother has set the rule ahead of time. She knows what the relationship between dirt and baths are, and she wants to make sure that the rule is followed.
When we work with boolean logic, we are like the mother: we know the operators ahead of time, and we want to work with the statement in that form. Perhaps we want to transform the statement into a different form (as was the original question, he or she wanted to know if two statements are equivalent). In computer programming we often want to plug a set of variables into the statement and see if the entire statement evaluates to true or false.
It's not a matter of knowing whether implies applies - it wouldn't have been written there if it shouldn't be. Truth tables are not about determining whether a rule applies, they are about determining whether a rule was adhered to.
I like to use the example: If it is raining, then it is cloudy.
Raining => Cloudy
Contrary to what many beginners might think, this in no way suggests that rain causes cloudiness, or that cloudiness causes rain. (EDIT: It means only that, at the moment, it is not both raining and not cloudy. See my recent blog posting on material implication here. There I develop, among other things, a rationale for the usual "definition" for material implication. The reader will require some familiarity with basic methods of proof, e.g. direct proof and proof by contradiction.)
~[Raining & ~Cloudy]
Judging from the truth tables, it is possible to infer the value of a=>b only for a=1 and b=0. In this case the value of a=>b is 0. For the rest of values (a,b), the value of a=>b is undefined: both (a=>b)=0 ("a doesn't imply b") and (a=>b)=1 ("a implies b") are possible:
a b a=>b comment
0 0 ? it is not possible to infer whether a implies b because a=0
0 1 ? --"--
1 0 0 b is 0 when a is 1, so it is possible to conclude
that a does not imply b
1 1 ? whether a implies b is undefined because it is not known
whether b can be 0 when a=1 .
For a to imply b it is necessary and sufficient that b=1 always when a=1, so that there is no counterexample when a=1 and b=0. For the rows 1, 2 and 4 in the truth table it is not known whether there is counterexample: these rows do not contradict to (a=>b)=1, but they also do not prove (a=>b)=1 . In contrast, row 3 immediately disproves (a=>b)=1 because it provides a counterexample when a=1 and b=0.
I guess I may shock some readers with these explanations, but it seems there are severe errors somewhere in the basics of the logic we are taught, and that is one of the reasons for such problems as Boolean Satisfiability being not solved yet.
The best contribution on this question is given by Serge Rogatch.
Boolean logic applies only where the result of quantifying(or evaluation) is either true or false and the relationship between boolean logic propositions is based on this fact.
So there must exist a relationship or connection between the propositions.
In higher order logic, the relationship is not just a case of on/off, 1/0 or +voltage/-voltage, the evaluation of a worded proposition is more complex. If no relationship exists between the worded propositions, then implication for worded propositions is not equivalent to boolean logic propositions.
While the implication truth table always yields correct results for binary propositions, this is not the case with worded propositions which may not be related in any way at all.
~A V B truth table:
A B Result/Evaluation
1 1 1
1 0 0
0 1 1
0 0 1
Worded proposition A: The moon is made of sour cream.
Worded proposition B: Tomorrow I will win the lotto.
A B Result/Evaluation
1 ? ?
As you can see, in this case, you can't even determine the state of B which will decide the result. Does this make sense now?
In this truth table, proposition ~A always evaluates to 1, therefore, the last two rows don't apply. However, the last two rows always apply in boolean logic.
http://thenewcalculus.weebly.com
Here's a compact statement:
Suppose we have two statements, A and B, each of which could either be true or false. Without any further information, there are 2 x 2 = 4 possibilities: "A and not B", "B and not A", "neither A nor B", and "both A and B".
Now impose the additional restriction that "if A, then also B". After imposing this restriction, the expression "x -> y", where -> is the "implication" operator, denotes whether it is still possible for A == x and B == y. The only outcome that is no longer possible after this additional restriction is A == 1 and B == 0, since that contradicts the restriction itself. Hence, we have 1 -> 0 is zero, and every other pair is 1.

Human name comparison: ways to approach this task

I'm not a Natural Language Programming student, yet I know it's not trivial strcmp(n1,n2).
Here's what i've learned so far:
comparing Personal Names can't be solved 100%
there are ways to achieve certain degree of accuracy.
the answer will be locale-specific, that's OK.
I'm not looking for spelling alternatives! The assumption is that the input's spelling is correct.
For example, all the names below can refer to the same person:
Berry Tsakala
Bernard Tsakala
Berry J. Tsakala
Tsakala, Berry
I'm trying to:
build (or copy) an algorithm which grades the relationship 2 input names
find an indexing method (for names in my database, for hash tables, etc.)
note:
My task isn't about finding names in text, but to compare 2 names. e.g.
name_compare( "James Brown", "Brown, James", "en-US" ) ---> 99.0%
I used Tanimoto Coefficient for a quick (but not super) solution, in Python:
"""
Formula:
Na = number of set A elements
Nb = number of set B elements
Nc = number of common items
T = Nc / (Na + Nb - Nc)
"""
def tanimoto(a, b):
c = [v for v in a if v in b]
return float(len(c)) / (len(a)+len(b)-len(c))
def name_compare(name1, name2):
return tanimoto(name1, name2)
>>> name_compare("James Brown", "Brown, James")
0.91666666666666663
>>> name_compare("Berry Tsakala", "Bernard Tsakala")
0.75
>>>
Edit: A link to a good and useful book.
Soundex is sometimes used to compare similar names. It doesn't deal with first name/last name ordering, but you could probably just have your code look for the comma to solve that problem.
We've just been doing this sort of work non-stop lately and the approach we've taken is to have a look-up table or alias list. If you can discount misspellings/misheard/non-english names then the difficult part is taken away. In your examples we would assume that the first word and the last word are the forename and the surname. Anything in between would be discarded (middle names, initials). Berry and Bernard would be in the alias list - and when Tsakala did not match to Berry we would flip the word order around and then get the match.
One thing you need to understand is the database/people lists you are dealing with. In the English speaking world middle names are inconsistently recorded. So you can't make or deny a match based on the middle name or middle initial. Soundex will not help you with common name aliases such as "Dick" and "Richard", "Berry" and "Bernard" and possibly "Steve" and "Stephen". In some communities it is quite common for people to live at the same address and have 2 or 3 generations living at that address with the same name. The only way you can separate them is by date of birth. Date of birth may or may not be recorded. If you have the clout then you should probably make the recording of date of birth mandatory. A lot of "people databases" either don't record date of birth or won't give them away due to privacy reasons.
Effectively people name matching is not that complicated. Its entirely based on the quality of the data supplied. What happens in practice is that a lot of records remain unmatched - and even a human looking at them can't resolve the mismatch. A human may notice name aliases not recorded in the aliases list or may be able to look up details of the person on the internet - but you can't really expect your programme to do that.
Banks, credit rating organisations and the government have a lot of detailed information about us. Previous addresses, date of birth etc. And that helps them join up names. But for us normal programmers there is no magic bullet.
Analyzing name order and the existence of middle names/initials is trivial, of course, so it looks like the real challenge is knowing common name alternatives. I doubt this can be done without using some sort of nickname lookup table. This list is a good starting point. It doesn't map Bernard to Berry, but it would probably catch the most common cases. Perhaps an even more exhaustive list can be found elsewhere, but I definitely think that a locale-specific lookup table is the way to go.
I had real problems with the Tanimoto using utf-8.
What works for languages that use diacritical signs is difflib.SequenceMatcher()