DataBase - Is this relation R is in BCNF and dependency preserving? - relational-database

For R(A,B,C,D,E,G,H) here's the minimal cover:
{A->E,D->H,D->G,E->C,G->B,G->C,H->D}
Candidate keys:
{AH,AD}
By the definition of BCNF, none of the attributes on left side are SK or CK. Thus, it's not in BCNF. Is it safe to conclude that all of the FDs are violating BCNF? If so, in the process of decomposition to BCNF, as the algorithm says, to take the FD that violates BCNF, for example: X->Y, and do the procedure of R1(XY) and R2(R-Y)
In our case, do I need to that do that all over the FDs? If I do so, I get in the end
R1(AE), R2(EC), R3(GB), R4(DH), R5(DG) and R6(AD)
But still missing G->C and H->D and R6 isn't in the FD from the start. So that doesn't make it dependency preserving?

Is it safe to conclude that all of the FDs are violating BCNF?
Yes
... and do the procedure of R1(XY) and R2(R-Y)
The standard analysis algorithm decompose the original schema in two subschemes R1(X+), R2(R - (X+ - X)). So if you start, for example, from AE, you produce R1(AEC) (since A+ = AEC), and R2(ABDGH). Then you repeat the steps in the remaining relations, if there are other dependencies that violates the BCNF.
For instance, in this case a decomposition that can be obtained is:
R4(AH)
R5(BG)
R6(DGH)
R7(CE)
R8(AE)
Note that, with this decomposition, the dependency G -> C is not preserved (it is a known fact that the algorithm can produce the loss of one or more dependencies).

Related

Finding the strongest normal form and if it isn't in BCNF decompose it?

I know how to do these problems easily when the input is basic. I know the rules for the 1st,2nd,and 3rd normal forms as well as BCNF. HOWEVER I was looking through some practice exercises and I saw some unusual input:
Consider the following collection of relations and dependencies.
Assume that each relation is obtained through decomposition from a
relation with attributes ABCDEFGHI and that all the known
dependencies over relation ABCDEFGHI are listed for each question.
(The questions are independent of each other, obviously, since the given dependencies over ABCDEFGHI are different.)
R2(A,B,F) AC → E, B → F
R1(A,C,B,D,E) A → B, C → D
I can solve 2:
A+=AB
C+=CD
AC+=ABCD
ACE=ABCDE
So ACE is the candidate key, and none of A, C and E are superkeys. It isn't bcnf for sure. Decompose it and obtain (ACE)(AB)(CD) etc etc.
BUT Number 1 is confusing me! Why is there AC → E when neither C nor E is in R2? How could this be solved? It can't be an error because many other exercises are like this :/
Another question, what happens when one functional dependency is in BCNF and others are not? Do we just ignore this functional dependency while decomposing the others into BCNF?
If I understand correctly the text of the exercise, the dependencies are those holding on the original relation (ABCDEFGHI): “all the known dependencies over relation ABCDEFGHI are listed for each question”.
So, assuming that in the original relation the only specified dependencies are AC → E and B → F, this means that the dependency AC → E is lost in the decomposed relation R2(A,B,F), that the (only) candidate key of the relation is AB, the schema is not in 2NF (since F depends on a part of a key), and that to decompose that schema in BCNF you must decompose it in (AB) and (BF).

Decomposing a relation into BCNF

I'm having trouble establishing when a relation is in Boyce-Codd Normal Form and how to decompose it info BCNF if it is not. Given this example:
R(A, C, B, D, E) with functional dependencies: A -> B, C -> D
How do I go about decomposing it?
The steps I've taken are:
A+ = AB
C+ = CD
R1 = A+ = **AB**
R2 = ACDE (since elements of C+ still exist, continue decomposing)
R3 = C+ = **CD**
R4 = ACE (no FD closures reside in this relation)
So now I know that ACE will compose the whole relation, but the answer for the decomposition is: AB, CD, ACE.
I suppose I'm struggling with how to properly decompose a relation into BCNF form and how to tell when you're done. Would really appreciate anyone who can walk me through their thought process when solving these problems. Thanks!
Although the question is old, the other questions/answers don't seem to provide a very clear step-by-step general answer on determining and decomposing relations to BCNF.
1. Determine BCNF:
For relation R to be in BCNF, all the functional dependencies (FDs) that hold in R need to satisfy property that the determinants X are all superkeys of R. i.e. if X->Y holds in R, then X must be a superkey of R to be in BCNF.
In your case, it can be shown that the only candidate key (minimal superkey) is ACE.
Thus both FDs: A->B and C->D are violating BCNF as both A and C are not superkeys or R.
2. Decompose R into BCNF form:
If R is not in BCNF, we decompose R into a set of relations S that are in BCNF.
This can be accomplished with a very simple algorithm:
Initialize S = {R}
While S has a relation R' that is not in BCNF do:
Pick a FD: X->Y that holds in R' and violates BCNF
Add the relation XY to S
Update R' = R'-Y
Return S
In your case, the iterative steps are as follows:
S = {ABCDE} // Intialization S = {R}
S = {ACDE, AB} // Pick FD: A->B which violates BCNF
S = {ACE, AB, CD} // Pick FD: C->D which violates BCNF
// Return S as all relations are in BCNF
Thus, R(A,B,C,D,E) is decomposed into a set of relations: R1(A,C,E), R2(A,B) and R3(C,D) that satisfies BCNF.
Note also that in this case, functional dependency is preserved but normalization to BCNF does not guarantee this.
1NF -> 2NF -> 3NF -> BCNF
According to given FD set "ACE" forms the key.
Clearly R(A,B,C,D,E) is not in 2NF.
2NF decomposition gives R1(A,B) , R2(C,D) and R3(A,C,E).
this decomposition decomposed relations are in 3NF and also in BCNF.

Difference between declarative and model-based specification

I've read definition of these 2 notions on wiki, but the difference is still not clear. Could someone give examples and some easy explanation?
A declarative specification describes an operation or a function with a constraint that relates the output to the input. Rather than giving you a recipe for computing the output, it gives you a rule for checking that the output is correct. For example, consider a function that takes an array a and a value e, and returns the index of an element of the array matching e. A declarative specification would say exactly that:
function index (a, e)
returns i such that a[i] = e
In contrast, an operational specification would look like code, eg with a loop through the indices to find i. Note that declarative specifications are often non-deterministic; in this case, if e matches more than one element of e, there are several values of i that are valid (and the specification doesn't say which to return). This is a powerful feature. Also, declarative specifications are often not total: here, for example, the specification assumes that such an i exists (and in some languages you would add a precondition to make this explicit).
To support declarative specification, a language must generally provide logical operators (especially conjunction) and quantifiers.
A model-based language is one that uses standard mathematical structures (such as sets and relations) to describe the state. Alloy and Z are both model based. In contrast, algebraic languages (such as OBJ and Larch) use equations to describe state implicitly. For example, to specify an operation that inserts an element in a set, in an algebraic language you might write something like
member(insert(s,e),x) = (e = x) or member(s,x)
which says that after you insert e into s, the element x will be a member of the set if you just inserted that element (e is equal to x) or if it was there before (x is a member of s). In contrast, in a language like Z or Alloy you'd write something like
insert (s, e)
s' = s U {e}
with a constraint relating the new value of the set (s') to the old value (s).
For many examples of declarative, model-based specification, see the materials on Alloy at http://alloy.mit.edu, or my book Software Abstractions. You can also see comparisons between model-based declarative languages through an example in the appendix of the book, available at the book's website http://softwareabstractions.org.

Pattern matching with associative and commutative operators

Pattern matching (as found in e.g. Prolog, the ML family languages and various expert system shells) normally operates by matching a query against data element by element in strict order.
In domains like automated theorem proving, however, there is a requirement to take into account that some operators are associative and commutative. Suppose we have data
A or B or C
and query
C or $X
Going by surface syntax this doesn't match, but logically it should match with $X bound to A or B because or is associative and commutative.
Is there any existing system, in any language, that does this sort of thing?
Associative-Commutative pattern matching has been around since 1981 and earlier, and is still a hot topic today.
There are lots of systems that implement this idea and make it useful; it means you can avoid write complicated pattern matches when associtivity or commutativity could be used to make the pattern match. Yes, it can be expensive; better the pattern matcher do this automatically, than you do it badly by hand.
You can see an example in a rewrite system for algebra and simple calculus implemented using our program transformation system. In this example, the symbolic language to be processed is defined by grammar rules, and those rules that have A-C properties are marked. Rewrites on trees produced by parsing the symbolic language are automatically extended to match.
The maude term rewriter implements associative and commutative pattern matching.
http://maude.cs.uiuc.edu/
I've never encountered such a thing, and I just had a more detailed look.
There is a sound computational reason for not implementing this by default - one has to essentially generate all combinations of the input before pattern matching, or you have to generate the full cross-product worth of match clauses.
I suspect that the usual way to implement this would be to simply write both patterns (in the binary case), i.e., have patterns for both C or $X and $X or C.
Depending on the underlying organisation of data (it's usually tuples), this pattern matching would involve rearranging the order of tuple elements, which would be weird (particularly in a strongly typed environment!). If it's lists instead, then you're on even shakier ground.
Incidentally, I suspect that the operation you fundamentally want is disjoint union patterns on sets, e.g.:
foo (Or ({C} disjointUnion {X})) = ...
The only programming environment I've seen that deals with sets in any detail would be Isabelle/HOL, and I'm still not sure that you can construct pattern matches over them.
EDIT: It looks like Isabelle's function functionality (rather than fun) will let you define complex non-constructor patterns, except then you have to prove that they are used consistently, and you can't use the code generator anymore.
EDIT 2: The way I implemented similar functionality over n commutative, associative and transitive operators was this:
My terms were of the form A | B | C | D, while queries were of the form B | C | $X, where $X was permitted to match zero or more things. I pre-sorted these using lexographic ordering, so that variables always occurred in the last position.
First, you construct all pairwise matches, ignoring variables for now, and recording those that match according to your rules.
{ (B,B), (C,C) }
If you treat this as a bipartite graph, then you are essentially doing a perfect marriage problem. There exist fast algorithms for finding these.
Assuming you find one, then you gather up everything that does not appear on the left-hand side of your relation (in this example, A and D), and you stuff them into the variable $X, and your match is complete. Obviously you can fail at any stage here, but this will mostly happen if there is no variable free on the RHS, or if there exists a constructor on the LHS that is not matched by anything (preventing you from finding a perfect match).
Sorry if this is a bit muddled. It's been a while since I wrote this code, but I hope this helps you, even a little bit!
For the record, this might not be a good approach in all cases. I had very complex notions of 'match' on subterms (i.e., not simple equality), and so building sets or anything would not have worked. Maybe that'll work in your case though and you can compute disjoint unions directly.

Examples of monoids/semigroups in programming

It is well-known that monoids are stunningly ubiquitous in programing. They are so ubiquitous and so useful that I, as a 'hobby project', am working on a system that is completely based on their properties (distributed data aggregation). To make the system useful I need useful monoids :)
I already know of these:
Numeric or matrix sum
Numeric or matrix product
Minimum or maximum under a total order with a top or bottom element (more generally, join or meet in a bounded lattice, or even more generally, product or coproduct in a category)
Set union
Map union where conflicting values are joined using a monoid
Intersection of subsets of a finite set (or just set intersection if we speak about semigroups)
Intersection of maps with a bounded key domain (same here)
Merge of sorted sequences, perhaps with joining key-equal values in a different monoid/semigroup
Bounded merge of sorted lists (same as above, but we take the top N of the result)
Cartesian product of two monoids or semigroups
List concatenation
Endomorphism composition.
Now, let us define a quasi-property of an operation as a property that holds up to an equivalence relation. For example, list concatenation is quasi-commutative if we consider lists of equal length or with identical contents up to permutation to be equivalent.
Here are some quasi-monoids and quasi-commutative monoids and semigroups:
Any (a+b = a or b, if we consider all elements of the carrier set to be equivalent)
Any satisfying predicate (a+b = the one of a and b that is non-null and satisfies some predicate P, if none does then null; if we consider all elements satisfying P equivalent)
Bounded mixture of random samples (xs+ys = a random sample of size N from the concatenation of xs and ys; if we consider any two samples with the same distribution as the whole dataset to be equivalent)
Bounded mixture of weighted random samples
Let's call it "topological merge": given two acyclic and non-contradicting dependency graphs, a graph that contains all the dependencies specified in both. For example, list "concatenation" that may produce any permutation in which elements of each list follow in order (say, 123+456=142356).
Which others do exist?
Quotient monoid is another way to form monoids (quasimonoids?): given monoid M and an equivalence relation ~ compatible with multiplication, it gives another monoid. For example:
finite multisets with union: if A* is a free monoid (lists with concatenation), ~ is "is a permutation of" relation, then A*/~ is a free commutative monoid.
finite sets with union: If ~ is modified to disregard count of elements (so "aa" ~ "a") then A*/~ is a free commutative idempotent monoid.
syntactic monoid: Any regular language gives rise to syntactic monoid that is quotient of A* by "indistinguishability by language" relation. Here is a finger tree implementation of this idea. For example, the language {a3n:n natural} has Z3 as the syntactic monoid.
Quotient monoids automatically come with homomorphism M -> M/~ that is surjective.
A "dual" construction are submonoids. They come with homomorphism A -> M that is injective.
Yet another construction on monoids is tensor product.
Monoids allow exponentation by squaring in O(log n) and fast parallel prefix sums computation. Also they are used in Writer monad.
The Haskell standard library is alternately praised and attacked for its use of the actual mathematical terms for its type classes. (In my opinion it's a good thing, since without it I'd never even know what a monoid is!). In any case, you might check out http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Monoid.html for a few more examples:
the dual of any monoid is a monoid: given a+b, define a new operation ++ with a++b = b+a
conjunction and disjunction of booleans
over the Maybe monad (aka "option" in Ocaml), first and last. That is,first (Just a) b = Just a
first Nothing b = band likewise for last
The latter is just the tip of the iceberg of a whole family of monoids related to monads and arrows, but I can't really wrap my head around these (other than simply monadic endomorphisms). But a google search on monads monoids turns up quite a bit.
A really useful example of a commutative monoid is unification in logic and constraint languages. See section 2.8.2.2 of 'Concepts, Techniques and Models of Computer Programming' for a precise definition of a possible unification algorithm.
Good luck with your language! I'm doing something similar with a parallel language, using monoids to merge subresults from parallel computations.
Arbitrary length Roman numeral value computation.
https://gist.github.com/4542999