Given the follwing functional dependencies, it is a little bit confusing for me because third normal form says no non-prime attribute of R is transitively dependent on the primary key. So i removed the functional dependency
C --> DE from table and placed it in new relation but all these attributes can also be determinded by the primary key of the relation. I think that i can't remove D and E from this table or should i remove because further BCNF also does not help in removing these attributes.Question is when i remove first functional dependency should i also remove D and E from the first table?enter image description here
To put a relation into a given NF (normal form) you should follow an algorithm that has been advised for that NF. (Eg given some FDs, there are lots of others that hold, per Armstrong's axioms; you need to deal with them too. Eg there are certain benefits to "preserving" FDs when possible, and a decomposition to 3NF components that preserves FDs is always possible; but if we decompose so that some FD's attributes are split between components, we can fail to preserve FDs.)
Note that these algorithms do not involve first normalizing to lower NFs. (That can stop "good" higher-NF designs from being the final result.)
When you do decompose to get rid of a FD X -> Y from a relation with attributes R, the decomposition will always be non-loss/non-additive if the components have attribute sets X U Y and R - Y. By repeated decompositions all your components will eventually be in the NF you want (if it is BCNF or below). But your overall decomposition won't necessarily be as "nice" as an advised algorithm would give you.
Related
I'm trying to get a better understanding of normalisation so I can use best practices going forward. I've found a question in an old book and I'm a little confused by it. Essentially I'm given this table with the following data:
Name Sport Sport Centre
Jim Tennis A1
Jim Golf A2
Dan Tennis A1
Dan Golf A3
Ben Golf A2
So we're assuming that each sport centre can ONLY host one sport. What I want is to convert this to BCNF. My process (from what I've learned so far) is as follows:
1, I identified all of the functional dependencies here:
Sport Centre->Sport
(Name, Sport Centre)->Sport
2, I identified all candidate keys:
(Name, Sport Centre)
But this is where I get stuck. I thought to be in BCNF that the table must have more than 1 candidate key and I can only see one. I'm unsure how to get this to BCNF. What I have done is the following splitting up of the table:
Name Sport Centre
Jim A1
Jim A2
Dan A1
Dan A3
Ben A2
Sport Centre Sport
A1 Tennis
A2 Golf
A3 Golf
But I also understand that to be in 3NF (before BCNF) every attribute must be dependant on the full primary key, yet my splitting up breaks this rule.
How do I normalize properly here?
1, I identified all of the functional dependencies here:
You have not identified all the FDs (functional dependencies) that hold. First: FDs are between sets of attributes. Although it happens that if we restrict ourselves to FDs from a set of attributes to a set holding a single attribute then we can infer what other FDs hold. So we can restrict what we mean by "all", but you should know what you are saying. Next: You have identified some FDs that hold. But all the ones implied by them via Armstrong's axioms also hold. This always means some trivial FDs, eg {Sport Centre} -> Sport Centre & {} -> {}. Although it happens that we can infer the trivial FDs just from knowing the attributes. So again we can restrict what we mean by "all", but you should know what you are saying. It happens that you have identified all the non-trivial FDs with one attribute on the RHS. But you have not justified that the ones you found hold or that you have found all the ones that hold.
You need to learn algorithms & relevant definitions for generating a description of the set of all FDs that hold. Including Armstrong's axioms, the notion of a FD transitive closure & the notion of a FD canonical cover to concisely characterize a closure.
2, I identified all candidate keys:
Assuming that { {Sport Centre} -> Sport } is a canonical cover, the only CK is {Name, Sport Centre}.
You need to learn algorithms & relevant definitions for finding all CKs.
I thought to be in BCNF that the table must have more than 1 candidate key
That's wrong. You seem to be trying to recall something like "3NF & not BCNF implies more than 1 CK" or "3NF & 1 CK implies BCNF", which are true. But these don't give that BCNF implies more than 1 CK, or equivalently, that 1 CK implies not BCNF.
You need to learn a definition of BCNF & other relevant definitions.
I'm unsure how to get this to BCNF.
We can always decompose to a BCNF design. Most definitions of BCNF say it is when there are no FDs of a certain form. It happens that we can get to BCNF by repeatedly losslessly decomposing to eliminate a problem FD. However, that might needlessly not "preserve" FDs. So we typically decompose with preservation to 3NF/EKNF first, which can always preserve FDs. Although then going to BCNF might fail to preserve a FD even though there was a FD-preserving decomposition directly from the original.
You need to learn algorithms & relevant definitions for decomposing to a given NF. Including the notions of lossless decomposition & FD preservation.
But I also understand that to be in 3NF (before BCNF) that every attribute must be dependant on the full primary key and my splitting up breaks this rule.
To normalize to a given NF it is not necessary to go through lower NFs. In general that can eliminate good final NF designs from arising.
Also "to be in 3NF [...] every attribute must be dependent on the full primary key" is not correct. You need to memorize definitions--necessary & sufficient conditions. And PKs (primary keys) do not matter to normalization, CKs do. Although we can investigate the special case of just one CK, which we could then refer to as the PK. Also "my splitting up breaks this rule" doesn't make sense. A necessary condition for a table to be in some NF is not a rule about how to decompose to it or any other NF.
You need to find a (good) academic textbook and learn its normalization definitions & algorithms. (Dozens of textbooks are free online, also slides & courses.) When you are stuck following it, reference & quote it, show your work following it, and explain about how you are stuck.
I think I might have answered my own question, but I won't mark it unless an expert on the community can confirm.
So my splitting up is valid, I have incorrectly identified the candidate keys.
There are 2 candidate keys which are:
(Name,Sport Centre)
(Sport Centre, Sport)
If this is correct, then me splitting the tables up is BCNF and valid. I think this is correct.
We have 2 entities to represent in the database:
- Entity A having attributes (x, y, z, r).
- Entity B having attributes (x, y, z, s).
The 2 entities have 3 identical attributes, and only 1 different attribute. And although being very similar, they are not related and not intended to be used together in the business logic.
There are 2 approaches (maybe more) to represent these:
Create two separate tables, one for each entity.
This is the straight forward method. It results in two relatively small tables, and cleaner queries and business logic. But the tables are almost identical, so it's clearly redundant in a way (Imagine the identical attributes are much more than 3).
Create one shared table with 5 attributes (x, y, z, r, s, type).
(type) indicates the type of the entity being represented in this row (either A or B). This is assuming that attributes (r) and (s) are not mandatory so we can't rely on them to determine the type of the entity.
For example, this is the approach used by Wordpress to represent posts and pages (and a few other entities). It results in one relatively large table containing a lot of null fields, and relatively messy queries and business logic to filter the rows and separate the entities logically. But we end up with only one unique table instead of two redundant ones.
My questions are:
1- What are other advantages and disadvantages of each approach?
2- What are use cases for both approaches?
3- If the entities increased in number, or their relationships increased in complexity, will one approach be clearly better then the other? Like if instead of 2 entities, there are 20. Or the entities has many-to-many relationships with other entities in the database.
Thank you!
The 2 entities have 3 identical attributes, and only 1 different attribute. And although being very similar, they are not related and not intended to be used together in the business logic.
Without more knowledge of your schema/model, I would say if the two entities have similar attributes but are "not related" in the business logic, I would strongly urge for approach 1. I don't see how it matters how many attributes each entity has in common, if they are not related. It wouldn't be good database design to put unrelated data in the same table just because of similar attributes. More information on your schema would probably help the decision making as well.
I am reading an introductory book on database systems and the authour introdiced the term: relational variable - relvar.
It says that the relvar is a container for the actual relation.
What is it meant by container? Is this a pysical concept, like a place on disk? Is this more of an logical concept, so that container is just an umbrella term for metadata and relation?
A relation variable can be contrasted with a relation value. These concepts are analogous to simple algebraic variables like x, and values like 5.
A relation variable is a symbol that can reference different values at different times - hence the term variable, since its value can vary. For example, I might have a relation Employee which holds information about the people working for me at any given time.
A relation value is a particular state. Values don't vary. When we say the value of a variable changes, we actually mean that the variable is assigned a new value, which may be derived from the old value.
These are logical concepts. Container is an informal term which is accessible to a lay audience. However, it shouldn't be taken too literally. Variables and values can be implemented or represented in a variety of ways in physical systems.
I started to teach myself the basics of databases and i am currently working through 1. to 3. normal forms. What i understand until now is the wish to remove redundancy to make my databases less prone to inconsistency during phases of data-change as well as saving space by eliminating as much duplicates as possible.
For example if we have a table with the following columns:
CD_ID
title
artist
year
and change the design to have multiple tables where the first (CD) contains:
CD_ID
title
artist_ID
the second (artist) contains:
artist_ID
artist
year
I see that in the original table the year is transitively dependent on the ID via the artist. So we wanna get rid of that and create a table for the artists so our new CD table is now in third normal form.
But to do so i created another table (the artist table) which again is not in third normal form as far as I understand it, as we have the same type of transitive dependency like before just in another table.
Is this correct and if yes should i also normalize the artist table to be in 3rd NF? When do I stop?
TL;DR You need to follow a published algorithm to decompose to a given normal form.
PS You didn't get Artist from the original CD via normalization, since you introduced a new column. But assume table Artist has the obvious meaning. Why do you think it "again is not in third normal form as far as I understand it"? If artist -> year in the original CD then it also does in Artist. But then {artist} is, with {artist_id}, a CK (candidate key) of Artist, and Artist is in 3NF (and 5NF).
From your question's original version plus the current one, you have a proposed base table CD with columns cd_id, title, group & year, holding tuples where cd cd_id titled title was made by group group that formed in year year. Column cd_id is unique, hence is a CK. FD {group} -> year also holds.
Normalization does not introduce new column names. It replaces a proposed base table by others, each with a smaller subset of its columns, that always join to what its value would have been. Normalization up to BCNF is based on FDs (functional dependencies), which are also what determine the CKs of a base table. So your question does not contain a decomposition. A possible decomposition reminiscent of your question, which might or might not have any particular properties, would be to tables with column sets {cd_id, title, group} and {group, year}.
Other FDs hold in the original. Some hold because of what the columns are; some hold because of the CK; some hold because {group} -> year holds; in general, certain ones hold because all three of those do. And maybe others hold because of what tuples are supposed to go into the relation and what situations can arise. You need to decide for every possible FD whether it holds.
Of course, you might have been told that the only ones that hold are the ones that have to hold under those circumstances. But you won't have been told that the only FD that holds is {group} -> year, because there are trivial FDs and every superset of a CK functionally determines every set of columns.
One definition of 3NF is that a relation is in 2NF and no non-prime column is transitively functionally dependent on any CK. (Notice each condition involves other definitions.) If you want to use this to find out whether your relation is in 3NF then you next need to find out what all the CKs are. You can do this fastest via an appropriate algorithm, but you can just see which sets of columns functionally determine every column but don't contain a smaller such set, since those are the CKs. Then check the two conditions in the definition.
If you want to normalize to 3NF then you need to follow an algorithm for decomposing to 3NF. You don't explain what process you think you should follow. But if you aren't following a proven algorithm then whatever components you pick might or might not always join to the original and might or might not each be in any particular higher normal form. Note that examples of decompositions you have seen are not presentations of decomposition algorithms.
The NF (normal form) definitions give conditions that a relation must meet to be in that NF. They don't tell you how to nonloss decompose (preserving FDs when possible) to relations in higher NFs. People have worked out algorithms for producing decompositions to particular NFs. (And decomposing to a given NF doesn't in general involve first decomposition to lower NFs. Going through lower NFs can actually prevent good higher-NF decompositions of the original from being generated when you get to decomposing per a higher NF.)
You may also not realize that when some FDs hold, certain other ones must hold. The latter can be determined via Armstrong's axioms from the former. So just because you decomposed to get rid of a particular FD whose presence violates a particular NF doesn't mean there weren't a bunch of other ones that violated it that you didn't deal with. They can be present in the new components. Or they can be not present in problematic ways, so that you have not "preserved" them when you could have, leading to poor designs.
Learn about specific NF algorithms, and for that matter NFs and normalization itself, in a college/university textbook/course/presentation. Many are online.
Let C be a class that has another class D as its attribute. In principle, storing D as an embedded class of C must give better performance when retrieving than storing it as a separate entity through #ManyToOne (or even #OneToOne), since in the latter case, D needs to be retrieved from a separate table possibly containing millions of rows.
My question is whether this performance difference is significant, i.e. is it big enough to offset other considerations when deciding between embedding and #ManyToOne.
I realise this question is a bit soft, I guess what I'm looking for is people answering from experience.
My (soft too, like Your question) answer is 'allways make life simulation (model) in best possible way'.
I set myself leading question: has 'a small class' independent life? Or is only conceptual glue (like street+home+local named Address)? Can be conceptually equal for many entities? Can bee 'small class' field nullable (address of mountaineers base in Himalayas) - nullable #OneToOne?
Which Java model is better (real, natural)?
And I remember only one 'implementation question': is field of 'small class' always used (read) - very rare, can be serviced as Lazy, especially big. Move fields of sense 'Memo' varchar(4000) to extern table is good (with accessory fields: date, author of memo etc).
EDIT: how big is 'small class' and is rarely used?
In one project Address is correctly modelled as independent table, in other embedded.
Almost allays I have correct performance result with good logical model. Thinking about extremly 'only JPA performance' with logical errors give degradation is other place in code.
My opinion is: design by logical arguments. Performance may vary.