yesterday I took a database exam and the question about normalization was strange.
We had table R(ABCDEFG) and functional dependencies G->B, C->DG, CF->E, F-A. Which are the candidate keys for R? I only found one: CF. Then R1(DFG), which are the candidate keys for R1? I only found one: DFG. State a correct 3NF normalization for R. I stated ((C,F), E), ((G, B)), ((F), A), ((C), D)
and then the functional dependency GDF->C was added. What is now a correct 3NF normalization of R? I said ((G, D, F, C)), ((G), B), ((F, ); A), ((C), D), ((C, F), E)
Did I solve it correct?
Then even more strange, we should state what is what when the following are listed:
Product ID
Order number
Customer ID
Quantity
Customer name
Product name
Date
I concluded
G= Product ID
C= Order number
F= Customer ID
D= Quantity
A= Customer name
B= Product name
E= Date
Is this correct? What does the FD GDF->C mean in plain English?
"Yesterday I took a database exam and the question about normalization
was strange. We had table R(ABCDEFG) and functional dependencies G->B,
C->DG, CF->E, F-A. Which are the candidate keys for R? I only found
one: CF.
That seems OK.
Then R1(DFG), which are the candidate keys for R1? I only found one:
DFG.
With the very same set of FD's ??? With no FD's at all ??? Anyway, this one seems correct too.
State a correct 3NF normalization for R. I stated ((C,F), E), ((G,
B)), ((F), A), ((C), D)
((G), B) instead of ((G, B)) would be more like it.
((C), DG) instead of ((C), D) would be more like it too.
and then the functional dependency GDF->C was added. What is now a
correct 3NF normalization of R? I said ((G, D, F, C)), ((G), B), ((F,
); A), ((C), D), ((C, F), E)
Addition of this FD (/constraint) doesn't alter the 3NF form. All dependencies that are expressible in the 3NF design are still "out of complete keys". The fact that this additional dependency could not be preserved by the decomposition, does not lower the normal form. It's a dependency preservation issue, not a normal form issue.
Did I solve it correct?
Best option is to ask the teachers.
"Then even more strange, we should state what is what when the
following are listed:"
The folly. The question itself forces you to make assumptions. Date. What date is that ? Date of birth of the customer placing the order ? Date when the ordered product was assigned its current name ? Or perhaps Date when the order was placed ? Presumably so, but the thing is, this should be clearly spelled out in the specs and database designers should really be taught NEVER TO ASSUME ANYTHING ABOUT THE SPECS. Assumption is the mother of all screwups.
What does the FD GDF->C mean in plain English?
In plain English, and assuming your answer, it means that once a certain combination of {customer id, product id and quantity} has been used in an order, there can no longer appear a second order (with a different order id) with the very same {customer id, product id and quantity}. Or, iow : each customer can order a certain specific quantity of a certain specific product only once.
Related
I know how to do these problems easily when the input is basic. I know the rules for the 1st,2nd,and 3rd normal forms as well as BCNF. HOWEVER I was looking through some practice exercises and I saw some unusual input:
Consider the following collection of relations and dependencies.
Assume that each relation is obtained through decomposition from a
relation with attributes ABCDEFGHI and that all the known
dependencies over relation ABCDEFGHI are listed for each question.
(The questions are independent of each other, obviously, since the given dependencies over ABCDEFGHI are different.)
R2(A,B,F) AC → E, B → F
R1(A,C,B,D,E) A → B, C → D
I can solve 2:
A+=AB
C+=CD
AC+=ABCD
ACE=ABCDE
So ACE is the candidate key, and none of A, C and E are superkeys. It isn't bcnf for sure. Decompose it and obtain (ACE)(AB)(CD) etc etc.
BUT Number 1 is confusing me! Why is there AC → E when neither C nor E is in R2? How could this be solved? It can't be an error because many other exercises are like this :/
Another question, what happens when one functional dependency is in BCNF and others are not? Do we just ignore this functional dependency while decomposing the others into BCNF?
If I understand correctly the text of the exercise, the dependencies are those holding on the original relation (ABCDEFGHI): “all the known dependencies over relation ABCDEFGHI are listed for each question”.
So, assuming that in the original relation the only specified dependencies are AC → E and B → F, this means that the dependency AC → E is lost in the decomposed relation R2(A,B,F), that the (only) candidate key of the relation is AB, the schema is not in 2NF (since F depends on a part of a key), and that to decompose that schema in BCNF you must decompose it in (AB) and (BF).
Disclosure: I am taking Stanford's online database course. The forum there is dead, and I'm hoping for some help on SO.
Here's the quiz question:
Consider relation R(A,B,C,D,E) with multivalued dependencies:
A -» B, B -» D
and no functional dependencies. Suppose we decompose R into 4th Normal Form. Depending on the order in which we deal with 4NF violations, we can get different final decompositions. Which one of the following relation schemas could be in the final 4NF decomposition?
And here is my thinking:
Since we are given that there are no functional dependencies, the only key is set of attributes (A,B,C,D,E). In other words, both multivalued dependencies in the question are violating, and we must decompose them.
I am following the decomposition algorithm given in lecture:
Compute keys for R [done]
Repeat until all relations are in 4NF
Pick any R' with nontrivial A -» B that violates 4NF
Decompose R' into R_1(A, B) and R_2(A, rest)
Compute functional dependencies and multivalued dependencies for R_1 and R_2
Compute keys for R_1 and R_2
I see two ways to decompose the relations: start with A -» B or B -» D.
Starting with A -» B
R(A,B,C,D,E)
|
+-----------+
| |
R_1(A,B) R_2(A,C,D,E)
Since B and D are no longer in the same relation, we do not have a 4NF violation, and we're done. I'm not sure how to compute the FDs, MVDs, and keys at this point.
Starting with B -» D
R(A,B,C,D,E)
|
+-----------+
| |
R_1(B,D) R_2(B,A,C,E)
|
+----------+
| |
R_3(A,B) R_4(A,C,E)
At this point, (A and B) and (B and D) are decomposed into their own relations, so we have no violations, and we're done.
The answer choices:
At this point, I'm completely stumped. I do not see any of the relations in the answer choices, nor can I come up with an idea that will get me there:
CE
AD
AE
ABD
I don't need the answer outright, but what am I missing?
A correct answer is AD.
How is this obtained?
Consider that, like for functional dependencies, you can have multivalued dependencies implied by other multivalued dependencies. For instance, there is a pseudo-transitivity rule (or multi-valued transitivity rules) that says:
If X →→ Y holds, and Y →→ Z holds, then X →→ Z − Y holds
For this rule, from A →→ B and B →→ D you can derive A →→ D. So, if you decompose the relation in 4NF you could start from this dependency, and get a table with attributes AD. Or, alternatively, in your first decomposition, after finding R_1(A,B) and R_2(A,C,D,E), you should continue to decompose R_2, since it still contain the non-trivial MVD A →→ D, to find R_3(A, D) and R_3(A, C, E).
I've gone through internet and books and still have some difficulties on how to determine the normal form of this relation
R(a, b, c, d, e, f, g, h, i)
FDs =
B→G
BI→CD
EH→AG
G→DE
So far I've got that the only candidate key is BHI (If I should count with F, then BFHI).
Since the attribute F is not in use at all. Totally independent from the given FDs.
What am I supposed to do with the attribute F then?
How to determine the highest normal form for the realation R?
What am I supposed to do with the attribute F then?
You could observe the fact that the only FD in which F gets mentioned, is the trivial one F->F. It's not explicitly mentioned precisely because it is trivial. Nonetheless, all of Armstrong's axioms apply to trivial ones equally well. So, you can use this trivial one, e.g. applying augmentation, to go from B->G to BF->GF;
How to determine the highest normal form for the relation R?
first, test the condition of first normal form. If satisfied, NF is at least 1. Check the condition of second normal form. If satisfied, NF is at least 2. Check the condition of third normal form. If satisfied, NF is at least three.
Note :
"checking the condition of first normal form", is a bit of a weird thing to do in a formal process, because there exists no such thing as a formal definition of that condition, unless you go by Date's, but I have little doubt that your course does not follow that definition.
Hint :
Given that the sole key is BFHI, which is the first clause of "the key, the whole key, and nothing but the key" that gets violated by, say, B->G ?
I am having trouble understanding what my lecturer want me to do from this question. Can anyone help explain to me what he wants me to do?
Define a higher order version of the insertion sort algorithm. That is define
functions
insertBy :: Ord b => (a->b) -> a -> [a] -> [a]
inssortBy :: Ord b => (a->b) -> [a] -> [a]
and this bit is where it got me confused:
such that inssort f l sorts the list l such that an element x comes before an elementyif f x < f y.
If you were sorting numbers, then it's clear what x < y means. But what if you were sorting letters? Or customers? Or anything else without a clear (to the computer) ordering?
So you are supposed to create a function f() that defines that ordering for the sorting procedure. That f() will take the letters or customers or whatever and will return an integer for each one that the computer can actually sort on.
At least, that's how the problem is described. I personally would have designed a predicate that accepted two items, x and y and returned a boolean if x < y. But whichever is fine.
The code wants you to rewrite the insertion sort algorithm, but using a function as a parameter - thus a higher order function.
I would like to point out that this code, typo included, seems to stem from a piece of work currently due at a certain university - I found this page while searching for "insertion sort algortihm", as I copy pasted the term out of the document as well, typo included.
Seeking code from the internet is a risky business. Might I recommend the insertion sort algorithm wikipedia entry, or the Haskell code provided in your lecture slides (you are looking for the 'insertion sort algorithm' and for 'higher-order functions), as opposed to the several queries you have placed on Stack Overflow?
I'm not a Natural Language Programming student, yet I know it's not trivial strcmp(n1,n2).
Here's what i've learned so far:
comparing Personal Names can't be solved 100%
there are ways to achieve certain degree of accuracy.
the answer will be locale-specific, that's OK.
I'm not looking for spelling alternatives! The assumption is that the input's spelling is correct.
For example, all the names below can refer to the same person:
Berry Tsakala
Bernard Tsakala
Berry J. Tsakala
Tsakala, Berry
I'm trying to:
build (or copy) an algorithm which grades the relationship 2 input names
find an indexing method (for names in my database, for hash tables, etc.)
note:
My task isn't about finding names in text, but to compare 2 names. e.g.
name_compare( "James Brown", "Brown, James", "en-US" ) ---> 99.0%
I used Tanimoto Coefficient for a quick (but not super) solution, in Python:
"""
Formula:
Na = number of set A elements
Nb = number of set B elements
Nc = number of common items
T = Nc / (Na + Nb - Nc)
"""
def tanimoto(a, b):
c = [v for v in a if v in b]
return float(len(c)) / (len(a)+len(b)-len(c))
def name_compare(name1, name2):
return tanimoto(name1, name2)
>>> name_compare("James Brown", "Brown, James")
0.91666666666666663
>>> name_compare("Berry Tsakala", "Bernard Tsakala")
0.75
>>>
Edit: A link to a good and useful book.
Soundex is sometimes used to compare similar names. It doesn't deal with first name/last name ordering, but you could probably just have your code look for the comma to solve that problem.
We've just been doing this sort of work non-stop lately and the approach we've taken is to have a look-up table or alias list. If you can discount misspellings/misheard/non-english names then the difficult part is taken away. In your examples we would assume that the first word and the last word are the forename and the surname. Anything in between would be discarded (middle names, initials). Berry and Bernard would be in the alias list - and when Tsakala did not match to Berry we would flip the word order around and then get the match.
One thing you need to understand is the database/people lists you are dealing with. In the English speaking world middle names are inconsistently recorded. So you can't make or deny a match based on the middle name or middle initial. Soundex will not help you with common name aliases such as "Dick" and "Richard", "Berry" and "Bernard" and possibly "Steve" and "Stephen". In some communities it is quite common for people to live at the same address and have 2 or 3 generations living at that address with the same name. The only way you can separate them is by date of birth. Date of birth may or may not be recorded. If you have the clout then you should probably make the recording of date of birth mandatory. A lot of "people databases" either don't record date of birth or won't give them away due to privacy reasons.
Effectively people name matching is not that complicated. Its entirely based on the quality of the data supplied. What happens in practice is that a lot of records remain unmatched - and even a human looking at them can't resolve the mismatch. A human may notice name aliases not recorded in the aliases list or may be able to look up details of the person on the internet - but you can't really expect your programme to do that.
Banks, credit rating organisations and the government have a lot of detailed information about us. Previous addresses, date of birth etc. And that helps them join up names. But for us normal programmers there is no magic bullet.
Analyzing name order and the existence of middle names/initials is trivial, of course, so it looks like the real challenge is knowing common name alternatives. I doubt this can be done without using some sort of nickname lookup table. This list is a good starting point. It doesn't map Bernard to Berry, but it would probably catch the most common cases. Perhaps an even more exhaustive list can be found elsewhere, but I definitely think that a locale-specific lookup table is the way to go.
I had real problems with the Tanimoto using utf-8.
What works for languages that use diacritical signs is difflib.SequenceMatcher()