Explanation of ER model to functional dependencies solution - relational-database

I am trying to understand solution on one exercise that translates ER model to functional dependencies.
As you can see above, we only have relation names and nothing else besides that, and by solution, they somehow come up to conclusion that
mother, daughter → father
father, daughter → mother
mother, son → father
father, son → mother
And that moreover we can infer additional f dependencies to represent real world more accurately:
mother, son → father
father, son → mother
What I don't understand here is how can we base our translation to functional dependencies by just looking at the relation names, since I always when I did translation from ER model to relational model, I did it by looking at attribute names. Is it possible that there is something wrong in this exercise, or that the task giver was just trying to make us translation to functional dependencies more closer to me as a student first time looking at it (i.e. by making in more educational)?

TL;DR
As you can see above, we only have relation names and nothing else besides that
Well, there are some lines, each with a name & a digit or letter.
There isn't anything strange about the assignment. Go to your textbook & find out exactly what every technical term & graphic element means. (In particular you might be confused by the meaning of "relationship" or your cardinality labels.) How the solution arises follows.
The R in ERM (for the Entity-Relationship Model) stands for "relationship" as in an n-ary association. Similarly the R in RM (for the Relational Model) uses "relation" per its connected meanings in mathematics as an n-ary relationship/association & as a set of tuples of values that participate in or satisfy such a relation, that are so related/associated. But these ER & RM "relationships" are not participations or FKs (foreign keys), which get called "relationships" or "relations" by methods & presentations that claim to be ER or relational but are not.
A Chen ER diagram has a box per entity type, a diamond per relationship type & a line per participation. A name on a line gives the attribute in the relationship for that participation. A 1 or N label on a line gives cardinality information for that participation.
Each ER entity & relationship type gets a representing relation. Hence a relation per box & diamond & a FK per line. A FD (functional dependency) applies to a relation. So we can say that a FD applies to a relationship.
Presumably the diagram's relationships/relations are like:
-- man [son] is the son of man [father] & woman [mother]
isSon(son, father, mother)
-- woman [daughter] is the daughter of man [father] & woman [mother]
isDaughter(daughter, father, mother)
There are many styles of information modelling methods & diagrams & cardinality labels. Your diagram seems to be one of the Chen styles of true ER. A cardinality label on a participation by an entity type in a relationship type says how many entities of that type can participate with or share a given instance/subtuple of the other participant types over all situations that can arise. A 1 means exactly one & an N means one or more. An n-ary relationship cardinality is a sequence of n such participation cardinalities, "X:Y:..." or "X to Y to ...".
Notice that such a sequence assumes a certain order of entities/attributes. Notice that for a binary relationship N:1 for one order means 1:N for the reverse order. (hasMother(child, mother) is N:1 when/iff isMother(mother, child) is 1:N.) Notice that every entity/attribute in X:Y:... asserts about "a given" or "one given" subtuple of others. That "one" is not from a cardinality label. It's from the phrase that the label is plugged into. In particular it's not from a 1 in a binary N:1 or 1:N.
From your lines we get:
isSon
N sons participate with a given father-mother pair
1 father participates with a given son-mother pair
1 mother participates with a given son-father pair
isDaughter
N daughters participate with a given father-mother pair
1 father participates with a given daughter-mother pair
1 mother participates with a given daughter-father pair
(Binary X:Y is often sloppily expressed as "X entities have Y of the other & Y entities have X of the other". It is intended to mean "a given 1st appears with Y of the other & a given 2nd appears with X of the other". That phrasing doesn't generalize to n-ary relationships or X:Y:.... (A given nth entity appears with how many of what?) That sloppy expression also intends to mean, using the nth letter with the nth entity, "X of the 1st appear with a given subtuple of the others & Y of the 2nd appear with a given subtuple of the others". Which is the Chen n-ary interpretation for n=2. You might think that we could use X:Y:... to mean "a given 1st appears with X subtuples of others & a given 2nd appears with Y subtuples of others & ...". However, that turns out not to be as useful & it contradicts the normal meaning of binary X:Y by taking it to mean what Y:X normally means.)
This particular form of cardinality constraint happens to tell us something rather directly about some FDs (functional dependencies) that hold & don't hold in the representing relations. Observe lines with 1s & Ns paralleling some FDs that hold & don't hold:
isSon
{father, mother} → {son} does not hold
{son, mother} → {father}
{son, father} → {mother}
isDaughter
{father, mother} → {daughter} does not hold
{daughter, mother} → {father}
{daughter, father} → {mother}
What we can determine about FDs per "the real world" depends on what we consider that to encompass & what exactly each relationship's/relation's meaning is--its tuple membership condition/criterion--its (characteristic) predicate. Constraints--including FKs & FDs--are determined by the predicates & the situations/states that can arise. In turn that depends on the meanings of words used in the specification including those expressing the predicates & business rules.
If the predicates above involve biological sons & daughters then we can reasonably expect the given cardinalities & also expect that {son} → {father, mother} & {daughter} → {father, mother}. (Note that those FDs imply the above FDs that determine {father} & {mother} & their cardinality labels.)
However for other predicates & business rules (per what expressions express them & what the words in them mean) & notions of "the real world" we might have different FDs. It all depends on what exactly is intended by "father", "mother", "son" & "daughter", or for that matter "man" & "woman", and what associated rules you assume re what situations/states can arise for your application--marriage, divorce, adoption, disinheritance, parenting, death, etc & just what people you are drawing from & when, etc. One really needs to be really clear.
But you don't clearly give the predicates or the business rules. (Or a legend for interpreting your diagram.) So it's hard to say what other non-implied FDs might hold or not hold.
So there isn't anything strange about the assignment.
PS When you are stuck go to your textbook & find out exactly what everything means & ask a question about where you are first stuck understanding or following a specific definition, axiom/assumption, theorem/fact or algorithm/method/process/heuristic. This question is not from being stuck creatively; here you are supposed to mechanically apply a method in your textbook.
PS Relational Schema to ER Diagram /cardinalities difference

Related

Finding the strongest normal form and if it isn't in BCNF decompose it?

I know how to do these problems easily when the input is basic. I know the rules for the 1st,2nd,and 3rd normal forms as well as BCNF. HOWEVER I was looking through some practice exercises and I saw some unusual input:
Consider the following collection of relations and dependencies.
Assume that each relation is obtained through decomposition from a
relation with attributes ABCDEFGHI and that all the known
dependencies over relation ABCDEFGHI are listed for each question.
(The questions are independent of each other, obviously, since the given dependencies over ABCDEFGHI are different.)
R2(A,B,F) AC → E, B → F
R1(A,C,B,D,E) A → B, C → D
I can solve 2:
A+=AB
C+=CD
AC+=ABCD
ACE=ABCDE
So ACE is the candidate key, and none of A, C and E are superkeys. It isn't bcnf for sure. Decompose it and obtain (ACE)(AB)(CD) etc etc.
BUT Number 1 is confusing me! Why is there AC → E when neither C nor E is in R2? How could this be solved? It can't be an error because many other exercises are like this :/
Another question, what happens when one functional dependency is in BCNF and others are not? Do we just ignore this functional dependency while decomposing the others into BCNF?
If I understand correctly the text of the exercise, the dependencies are those holding on the original relation (ABCDEFGHI): “all the known dependencies over relation ABCDEFGHI are listed for each question”.
So, assuming that in the original relation the only specified dependencies are AC → E and B → F, this means that the dependency AC → E is lost in the decomposed relation R2(A,B,F), that the (only) candidate key of the relation is AB, the schema is not in 2NF (since F depends on a part of a key), and that to decompose that schema in BCNF you must decompose it in (AB) and (BF).

how to find the highest normal form for a given relation

I've gone through internet and books and still have some difficulties on how to determine the normal form of this relation
R(a, b, c, d, e, f, g, h, i)
FDs =
B→G
BI→CD
EH→AG
G→DE
So far I've got that the only candidate key is BHI (If I should count with F, then BFHI).
Since the attribute F is not in use at all. Totally independent from the given FDs.
What am I supposed to do with the attribute F then?
How to determine the highest normal form for the realation R?
What am I supposed to do with the attribute F then?
You could observe the fact that the only FD in which F gets mentioned, is the trivial one F->F. It's not explicitly mentioned precisely because it is trivial. Nonetheless, all of Armstrong's axioms apply to trivial ones equally well. So, you can use this trivial one, e.g. applying augmentation, to go from B->G to BF->GF;
How to determine the highest normal form for the relation R?
first, test the condition of first normal form. If satisfied, NF is at least 1. Check the condition of second normal form. If satisfied, NF is at least 2. Check the condition of third normal form. If satisfied, NF is at least three.
Note :
"checking the condition of first normal form", is a bit of a weird thing to do in a formal process, because there exists no such thing as a formal definition of that condition, unless you go by Date's, but I have little doubt that your course does not follow that definition.
Hint :
Given that the sole key is BFHI, which is the first clause of "the key, the whole key, and nothing but the key" that gets violated by, say, B->G ?

Examples of monoids/semigroups in programming

It is well-known that monoids are stunningly ubiquitous in programing. They are so ubiquitous and so useful that I, as a 'hobby project', am working on a system that is completely based on their properties (distributed data aggregation). To make the system useful I need useful monoids :)
I already know of these:
Numeric or matrix sum
Numeric or matrix product
Minimum or maximum under a total order with a top or bottom element (more generally, join or meet in a bounded lattice, or even more generally, product or coproduct in a category)
Set union
Map union where conflicting values are joined using a monoid
Intersection of subsets of a finite set (or just set intersection if we speak about semigroups)
Intersection of maps with a bounded key domain (same here)
Merge of sorted sequences, perhaps with joining key-equal values in a different monoid/semigroup
Bounded merge of sorted lists (same as above, but we take the top N of the result)
Cartesian product of two monoids or semigroups
List concatenation
Endomorphism composition.
Now, let us define a quasi-property of an operation as a property that holds up to an equivalence relation. For example, list concatenation is quasi-commutative if we consider lists of equal length or with identical contents up to permutation to be equivalent.
Here are some quasi-monoids and quasi-commutative monoids and semigroups:
Any (a+b = a or b, if we consider all elements of the carrier set to be equivalent)
Any satisfying predicate (a+b = the one of a and b that is non-null and satisfies some predicate P, if none does then null; if we consider all elements satisfying P equivalent)
Bounded mixture of random samples (xs+ys = a random sample of size N from the concatenation of xs and ys; if we consider any two samples with the same distribution as the whole dataset to be equivalent)
Bounded mixture of weighted random samples
Let's call it "topological merge": given two acyclic and non-contradicting dependency graphs, a graph that contains all the dependencies specified in both. For example, list "concatenation" that may produce any permutation in which elements of each list follow in order (say, 123+456=142356).
Which others do exist?
Quotient monoid is another way to form monoids (quasimonoids?): given monoid M and an equivalence relation ~ compatible with multiplication, it gives another monoid. For example:
finite multisets with union: if A* is a free monoid (lists with concatenation), ~ is "is a permutation of" relation, then A*/~ is a free commutative monoid.
finite sets with union: If ~ is modified to disregard count of elements (so "aa" ~ "a") then A*/~ is a free commutative idempotent monoid.
syntactic monoid: Any regular language gives rise to syntactic monoid that is quotient of A* by "indistinguishability by language" relation. Here is a finger tree implementation of this idea. For example, the language {a3n:n natural} has Z3 as the syntactic monoid.
Quotient monoids automatically come with homomorphism M -> M/~ that is surjective.
A "dual" construction are submonoids. They come with homomorphism A -> M that is injective.
Yet another construction on monoids is tensor product.
Monoids allow exponentation by squaring in O(log n) and fast parallel prefix sums computation. Also they are used in Writer monad.
The Haskell standard library is alternately praised and attacked for its use of the actual mathematical terms for its type classes. (In my opinion it's a good thing, since without it I'd never even know what a monoid is!). In any case, you might check out http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Monoid.html for a few more examples:
the dual of any monoid is a monoid: given a+b, define a new operation ++ with a++b = b+a
conjunction and disjunction of booleans
over the Maybe monad (aka "option" in Ocaml), first and last. That is,first (Just a) b = Just a
first Nothing b = band likewise for last
The latter is just the tip of the iceberg of a whole family of monoids related to monads and arrows, but I can't really wrap my head around these (other than simply monadic endomorphisms). But a google search on monads monoids turns up quite a bit.
A really useful example of a commutative monoid is unification in logic and constraint languages. See section 2.8.2.2 of 'Concepts, Techniques and Models of Computer Programming' for a precise definition of a possible unification algorithm.
Good luck with your language! I'm doing something similar with a parallel language, using monoids to merge subresults from parallel computations.
Arbitrary length Roman numeral value computation.
https://gist.github.com/4542999

Human name comparison: ways to approach this task

I'm not a Natural Language Programming student, yet I know it's not trivial strcmp(n1,n2).
Here's what i've learned so far:
comparing Personal Names can't be solved 100%
there are ways to achieve certain degree of accuracy.
the answer will be locale-specific, that's OK.
I'm not looking for spelling alternatives! The assumption is that the input's spelling is correct.
For example, all the names below can refer to the same person:
Berry Tsakala
Bernard Tsakala
Berry J. Tsakala
Tsakala, Berry
I'm trying to:
build (or copy) an algorithm which grades the relationship 2 input names
find an indexing method (for names in my database, for hash tables, etc.)
note:
My task isn't about finding names in text, but to compare 2 names. e.g.
name_compare( "James Brown", "Brown, James", "en-US" ) ---> 99.0%
I used Tanimoto Coefficient for a quick (but not super) solution, in Python:
"""
Formula:
Na = number of set A elements
Nb = number of set B elements
Nc = number of common items
T = Nc / (Na + Nb - Nc)
"""
def tanimoto(a, b):
c = [v for v in a if v in b]
return float(len(c)) / (len(a)+len(b)-len(c))
def name_compare(name1, name2):
return tanimoto(name1, name2)
>>> name_compare("James Brown", "Brown, James")
0.91666666666666663
>>> name_compare("Berry Tsakala", "Bernard Tsakala")
0.75
>>>
Edit: A link to a good and useful book.
Soundex is sometimes used to compare similar names. It doesn't deal with first name/last name ordering, but you could probably just have your code look for the comma to solve that problem.
We've just been doing this sort of work non-stop lately and the approach we've taken is to have a look-up table or alias list. If you can discount misspellings/misheard/non-english names then the difficult part is taken away. In your examples we would assume that the first word and the last word are the forename and the surname. Anything in between would be discarded (middle names, initials). Berry and Bernard would be in the alias list - and when Tsakala did not match to Berry we would flip the word order around and then get the match.
One thing you need to understand is the database/people lists you are dealing with. In the English speaking world middle names are inconsistently recorded. So you can't make or deny a match based on the middle name or middle initial. Soundex will not help you with common name aliases such as "Dick" and "Richard", "Berry" and "Bernard" and possibly "Steve" and "Stephen". In some communities it is quite common for people to live at the same address and have 2 or 3 generations living at that address with the same name. The only way you can separate them is by date of birth. Date of birth may or may not be recorded. If you have the clout then you should probably make the recording of date of birth mandatory. A lot of "people databases" either don't record date of birth or won't give them away due to privacy reasons.
Effectively people name matching is not that complicated. Its entirely based on the quality of the data supplied. What happens in practice is that a lot of records remain unmatched - and even a human looking at them can't resolve the mismatch. A human may notice name aliases not recorded in the aliases list or may be able to look up details of the person on the internet - but you can't really expect your programme to do that.
Banks, credit rating organisations and the government have a lot of detailed information about us. Previous addresses, date of birth etc. And that helps them join up names. But for us normal programmers there is no magic bullet.
Analyzing name order and the existence of middle names/initials is trivial, of course, so it looks like the real challenge is knowing common name alternatives. I doubt this can be done without using some sort of nickname lookup table. This list is a good starting point. It doesn't map Bernard to Berry, but it would probably catch the most common cases. Perhaps an even more exhaustive list can be found elsewhere, but I definitely think that a locale-specific lookup table is the way to go.
I had real problems with the Tanimoto using utf-8.
What works for languages that use diacritical signs is difflib.SequenceMatcher()

What are Boolean Networks?

I was reading this SO question and I got intrigued by what are Boolean Networks, I've looked it up in Wikipedia but the explanation is too vague IMO. Can anyone explain me what Boolean Networks are? And if possible with some examples too?
Boolean networks represent a class of networks where the nodes have states and the edges represent transitions between states. In the simplest case, these states are either 1 or 0 – i.e. boolean.
Transitions may be simple activations or inactivations. For example, consider nodes a and b with an edge from a to b.
f
a ------> b
Here, f is a transition function. In the case of activation, f may be defined as:
f(x) = x
i.e. b's value is 1 if and only if a's value is 1. Conversely, an inactivation (or repression) might look like this:
f(x) = NOT x
More complex networks use more involved boolean functions. E.g. consider:
a b
\ /
\ /
\ /
v
c
Here, we've got edges from a to c and from b to c. c might be defined in terms of a and b as follows.
f(a, b) = a AND NOT b
Thus, c is activated only if a is active and b is inactive, at the same time.
Such networks can be used to model all kinds of relations. One that I know of is in systems biology where they are used to model (huge) interaction networks of chemicals in living cells. These networks effectively model how certain aspects of the cells work and they can be used to find deficiencies, points of attack for drugs and similarities between unrelated components that point to functional equivalence. This is fundamentally important in understanding how life works.