In the world of generic programing the notion of refinement is very common. In particular given a concept C1, then we say that a concept C2 refines C1 if it provides all the functionalities of C1 and possibly more.
How do you call the inverse relation? So if C2 is a refinement of C1 then C1 is a what of C2?
There are two terms in linguistics which define the relation discussed in the topic.
Hyponym shares a type-of relationship with its hypernym.
Hypernymy is the semantic relation in which one word is the hypernym of another. Hyponymy is the oppopsite relation.
Then "bulldog" is a hyponym of the "dog" concept, the "dog" concept is the hypernym for the bulldog concept.
Since the related transformation of requirements is called "lifting" I suggest the same for concepts. C1 is a lifting of C2. However someone with native English should better help here.
Related
Suppose I have two classes A and B. In A the functionality 'extract identification number of type X from string' is implemented to fulfill an overarching task. The same is done in B.
Now this functionality is moved to a separate class C to increase reusability. Classes A and B now use class C.
Doesn't that violate the definition "A module should be responsible to one, and only one, actor" if A und B are not part of the same module?
Background
The Single Responsibility Principle (SRP) was introduced by Robert C. Martin in Principles of Object Oriented Design: "There should never be more than one reason for a class to change.". A widespread but erroneous assumption is that SRP states that each class only has to fulfill one clearly defined task.
In his book Clean Architecture: A Craftsman's Guide to Software Structure and Design, Robert C. Martin addresses the misinterpretation of the SRP and proposes the "final version" of the definition.
“A module should be responsible to one, and only one, actor.”
I'm trying to get a better understanding of normalisation so I can use best practices going forward. I've found a question in an old book and I'm a little confused by it. Essentially I'm given this table with the following data:
Name Sport Sport Centre
Jim Tennis A1
Jim Golf A2
Dan Tennis A1
Dan Golf A3
Ben Golf A2
So we're assuming that each sport centre can ONLY host one sport. What I want is to convert this to BCNF. My process (from what I've learned so far) is as follows:
1, I identified all of the functional dependencies here:
Sport Centre->Sport
(Name, Sport Centre)->Sport
2, I identified all candidate keys:
(Name, Sport Centre)
But this is where I get stuck. I thought to be in BCNF that the table must have more than 1 candidate key and I can only see one. I'm unsure how to get this to BCNF. What I have done is the following splitting up of the table:
Name Sport Centre
Jim A1
Jim A2
Dan A1
Dan A3
Ben A2
Sport Centre Sport
A1 Tennis
A2 Golf
A3 Golf
But I also understand that to be in 3NF (before BCNF) every attribute must be dependant on the full primary key, yet my splitting up breaks this rule.
How do I normalize properly here?
1, I identified all of the functional dependencies here:
You have not identified all the FDs (functional dependencies) that hold. First: FDs are between sets of attributes. Although it happens that if we restrict ourselves to FDs from a set of attributes to a set holding a single attribute then we can infer what other FDs hold. So we can restrict what we mean by "all", but you should know what you are saying. Next: You have identified some FDs that hold. But all the ones implied by them via Armstrong's axioms also hold. This always means some trivial FDs, eg {Sport Centre} -> Sport Centre & {} -> {}. Although it happens that we can infer the trivial FDs just from knowing the attributes. So again we can restrict what we mean by "all", but you should know what you are saying. It happens that you have identified all the non-trivial FDs with one attribute on the RHS. But you have not justified that the ones you found hold or that you have found all the ones that hold.
You need to learn algorithms & relevant definitions for generating a description of the set of all FDs that hold. Including Armstrong's axioms, the notion of a FD transitive closure & the notion of a FD canonical cover to concisely characterize a closure.
2, I identified all candidate keys:
Assuming that { {Sport Centre} -> Sport } is a canonical cover, the only CK is {Name, Sport Centre}.
You need to learn algorithms & relevant definitions for finding all CKs.
I thought to be in BCNF that the table must have more than 1 candidate key
That's wrong. You seem to be trying to recall something like "3NF & not BCNF implies more than 1 CK" or "3NF & 1 CK implies BCNF", which are true. But these don't give that BCNF implies more than 1 CK, or equivalently, that 1 CK implies not BCNF.
You need to learn a definition of BCNF & other relevant definitions.
I'm unsure how to get this to BCNF.
We can always decompose to a BCNF design. Most definitions of BCNF say it is when there are no FDs of a certain form. It happens that we can get to BCNF by repeatedly losslessly decomposing to eliminate a problem FD. However, that might needlessly not "preserve" FDs. So we typically decompose with preservation to 3NF/EKNF first, which can always preserve FDs. Although then going to BCNF might fail to preserve a FD even though there was a FD-preserving decomposition directly from the original.
You need to learn algorithms & relevant definitions for decomposing to a given NF. Including the notions of lossless decomposition & FD preservation.
But I also understand that to be in 3NF (before BCNF) that every attribute must be dependant on the full primary key and my splitting up breaks this rule.
To normalize to a given NF it is not necessary to go through lower NFs. In general that can eliminate good final NF designs from arising.
Also "to be in 3NF [...] every attribute must be dependent on the full primary key" is not correct. You need to memorize definitions--necessary & sufficient conditions. And PKs (primary keys) do not matter to normalization, CKs do. Although we can investigate the special case of just one CK, which we could then refer to as the PK. Also "my splitting up breaks this rule" doesn't make sense. A necessary condition for a table to be in some NF is not a rule about how to decompose to it or any other NF.
You need to find a (good) academic textbook and learn its normalization definitions & algorithms. (Dozens of textbooks are free online, also slides & courses.) When you are stuck following it, reference & quote it, show your work following it, and explain about how you are stuck.
I think I might have answered my own question, but I won't mark it unless an expert on the community can confirm.
So my splitting up is valid, I have incorrectly identified the candidate keys.
There are 2 candidate keys which are:
(Name,Sport Centre)
(Sport Centre, Sport)
If this is correct, then me splitting the tables up is BCNF and valid. I think this is correct.
I have a site with users that I want users to be able to identify their ethnicities. What's the best way to model this if there is only 1 level of hierarchy?
Solution 1 (single table):
Ethnicity
- Id
- Parent Id
- Name
Solution 2 (two tables):
Ethnicity Group
- Id
- Name
Ethnicity
- Id
- Ethnicity Group Id
- Name
I will be using this so that users can search for other users based on ethnicity. Which of the 2 approaches will work better for me? Is there another approach I have not considered? I'm using MySQL.
Well there is such a thing as an Ethnicity Group in the real world, so you do need two tables, not one. The real world has three levels (the top-most would be Race), but I understand that may not be necessary here. If you squash the three levels into two, you have to be careful, and lay them all out properly at the beginning. However, they will be vulnerable to people saying they want the real thing, and you may have to change it, or change the structure to fit more in ... much more work later).
If you do it correctly, as per real world, that problem is eliminated. Let me know if you want Race, and I will change the model.
The tables are far too small, and the keys are too meaningful, to add Id-iot columns to them; leave them as pure Relational keys, otherwise you will lose the power of the Relational engine. If you really want narrow keys, use a CHAR(2) EthnicityCode, rather than a NUMERIC(10,0) or a meaningless number.
Link to Ethnicity Data Model (plus the answer to your other question)
Link to IDEF1X Notation for those who are unfamiliar with the Relational Modelling Standard.
If there is nothing like an "ethnicity group" in the real world, I'd suggest you don't introduce one in your data model.
All the queries you can do with the second one you can also do with the first one, because you can just select FROM ethnicity AS e1 JOIN ethnicity AS es ON (e2.ethnicity_id = e1.parent_id).
I don't want to be awkward, but what are you going to do with people of mixed descent? I think that the best that you can hope for is a simple single-level enumeration like the kind of thing you get on census forms (e.g. 'Black', 'White', 'Asian', 'Hispanic' etc). It's not ideal, but it allows people to fairly easily self-identify. Concepts like race and ethnicity are wooly enough without trying to create additional (largely meaningless) hierarchies on top of them, so my gut feeling is to keep it simple.
In designing RDBMS schema, I wonder if there is formal principle of concrete objects: for example, if it is Persons table, then each record is very concrete and unique. Each record in fact represents a unique person.
But what about a table such as Courses (as in school). It can have a description, number of units, offered only in Autumn (Fall) or Spring, etc, which are the "general properties" of a course.
And then there is actual CourseSessions, which has information about the time_from and time_to (such as 10 to 11am), whether it is Monday, Wednesday or Tue / Thur, and the instructor teaching it, and also pointing back using a course_id to the Courses table.
So the above 2 tables are both needed.
Are there principles of table design for "concrete" vs "abstract"?
Update: what I mean "abstract" here is that a course is an abstract idea... there can be multiple instances of it... such as the course Physics 10 from 10-11am, and another at 12-1pm.
for example, if it is Persons table, then each record is very concrete and unique. Each record in fact represents a unique person.
That is the hope, but not the reality of the situation.
By immigration or legal death status, it is possible for there to be two (or more records) that represent the same person. Uniquely identifying people is difficult - first, middle and surnames can match but actually reflect different people. SSN/SIN are not reliable, because they can change (immigration, legally dead). A name doesn't guarantee gender, and gender can be changed.
Are there principles of table design for "concrete" vs "abstract"
The classification of being "concrete" vs "abstract" is arbitrary, subject to interpretation. Does the start and end date really make a Course session "concrete"? Because I can book numerous things in [Calendaring software of choice] - doesn't mean class actually took place, or that final grades are legitimate values...
Table design is based on business rules, and the logical entities (which can become tables in the physical model) required to support those rules. Normalization helps make these entities more obvious.
The relational data model, base on mathematics, prove a way to design your data model on which certain operations is correct without risk.
Unfortunatly, this kind of data model is not a suitable solution for performance issue in database. How to organize tables for certain business domain is need to consider about not only the abstract model of objects or database normalization but also performance planning on your system. Yes, the leak of abstraction.
For example, there are two design strategies for tree structure: Adjacency model and Materialized path model(The art of SQL). Which one is better is based on which operations need to be optimized.
There is a good and classical article I recommend: The Law of Leaky Abstractions
Abstraction has its price (& it is often higher than expected)
By Keith Cooper
The art of SQL, of course, the soul of database design in my opinion.
I didn't feel this rule before, but it seems that a binary tree or any tree (each node can have many children but children cannot point back to any parent), then this data structure can be represented as 1 table in a database, with each row having an ID for itself and a parentID that points back to the parent node.
That is in fact the classical Employee - Manager diagram: one boss can have many people under him... and each person can have n people under him, etc. This is a tree structure and is represented in database books as a common example as a single table Employee.
The answer to your question is 'yes'.
Simon's warning about your trees becoming a cyclic graph is correct too.
All the stuff that has been said about "You have to ensure by hand that this won't happen, i.e. the DBMS won't do that for you automatically, because you will not break any integrity or reference rules.", is WRONG.
This remark and the coresponding comments holds true, as long as you only consider SQL systems.
There exist systems which CAN do this for you in a pure declarative way, that is without you having to write *any* code whatsoever. That system is SIRA_PRISE (http://shark.armchair.mb.ca/~erwin).
Yes, you can represent hierarchical structures by self-referencing the table. Just be aware of such situations:
Employee Supervisor
1 2
2 1
Yes, that is correct. Here's a good reference
Just be aware that you generally need a loop in order to unroll the tree (e.g. find transitive relationships)