Relational Table Normalization - relational-database

Relational Table Normalization - relational-database

I would like to know if the relational table in BCNF
Student(StudentNum, NRIC, DateOfBirth, BookTitle)
• Student’s number (StudentNum) uniquely identifies the National Registration Identity Card (NRIC) and the date of birth of the student (DateOfBirth).
• The NRIC determines the student’s date of birth (DateOfBirth).
According to my analysis, the relation is in 2NF. And after changing to BCNF it looks like this
Student(StudentNum, NRIC, BookTitle)
StudentDetails(NRIC, DateOfBirth)
My query;
Before the change 2NF
After the change BCNF
Am i correct.

The idea behind normalization is to eliminate redundancy of rows as much as possible by putting creating additional tables for those columns that may seem to inflict any redundancy on the table.
For example if we start with the following table: Student(StudentNum, NRIC, DateOfBirth, BookTitle) The unique columns in this case are the StudentNum and NRIC, the other 2 fields are not because other students may have the same date of birth and others may have borrowed the book of the same title. From there we see the need to normalize in order for us not to fall into the redundancy of data, for example what if the same student borrowed 100 different books?
If everything is in a single table, we may end up with lots of redundant(repetitive) data.
I suggest you check out this guide to the 5 normal forms http://www.bkent.net/Doc/simple5.htm
Update:
I think the beginning relation better to be considered 1st normal form given that everything is within a single table. Your resulting relation is 2NF I guess because what if different students have borrowed the same book? That may lead to repetition in the Student table.
I think you have to give more info regarding the scenario constituting your relations so we can analyze this better. It highly depends on the business rules.

No. Student is in 1NF, not 2NF.
You're starting with
Student(StudentNum, NRIC, DateOfBirth, BookTitle)
and these dependencies.
StudentNum->NRIC
StudentNum->DateOfBirth
NRIC->DateOfBirth
A relation is in 2NF if and only if
it's in 1NF, and
every non-prime attribute is dependent on the whole of every candidate key, not just on part of any candidate key.
So your first job is to determine the candidate keys of the Student relation. The Student relation has only one candidate key, and that's {StudentNum, BookTitle}.
Your textbook should have at least one algorithm for determining all the candidate keys of a relation.
Since NRIC is dependent on StudentNum, and StudentNum isn't a candidate key (it's just part of a candidate key), the relation Student is not in 2NF. Fix that by changing
Student(StudentNum, NRIC, DateOfBirth, BookTitle)
to this, by eliminating the partial key dependency on StudentNum.1
Student (StudentNum, NRIC, DateOfBirth)
StudentBooks (StudentNum, BookTitle)
StudentBooks has no non-prime attributes at all; it's now in 6NF. Student is in 2NF, but not yet in 3NF or BCNF. Do you know why?
It seems you did know why. There is indeed a transitive dependency: StudentNum->NRIC, and
NRIC->DateOfBirth. Fix the transitive dependency like this.
Student (StudentNum, NRIC)
NRIC (NRIC, DateOfBirth)
StudentBooks (StudentNum, BookTitle)
All three of those relations are in 6NF.
This decomposition might look a little odd. That's because textbook examples usually don't use meaningful names for either relations or attributes. Relations are usually named R{ABCD}, R1{ABC}, R2{AD}, etc. The decomposition above involved
projecting a new relation {StudentNum, NRIC, DateOfBirth} to eliminate the partial key dependency,
observing that the name "Student" no longer identified the relation consisting of {StudentNum, BookTitle},
moving the name "Student" from the original relation to the new, projected relation, and
coming up with a new name, StudentBooks, for the original relation.

Related

Is it antipattern to create a junction table with more than two foreign keys?

I am designing a database for school management.
Here instead of a single school, there are multiple schools with their own students, classes, teachers and subjects.
Here is the requirements.
• A school can have many classes
• A class can have many sub classes
• Many Students can belong to a section of a class
• Many Teacher can belong to a section of a class
• Many Subjects can belongs to a section of a class
• School wants to manage classes
• School wants to manage sections
• School wants to manage subjects in a section of a class
• School wants to assign a teacher for a subject in a section of a class
• School can assign a student a section of a class
Also, some school can have classes as named One, Two, Three and Some may have First, Second and Third. Also same for sections or sub classes
For entities School and Teacher things are straight forward. Below is my approach for them.
School
--------------
id
name
...
Teacher
--------------
id
school_id
name
But I am facing problem while mapping classes and sections for a school.
I tried to make two other entities classes and sections and give them a foreign key of the school
SchoolClass
-----------------
school_id
class_name
...
SchoolSection
----------------
school_id
section_name
...
For mapping a school's class to all its section I created a junction table, as it will be many to many relationship.
SchoolClassSection
---------------------------------------------------------
school_class_id
school_section_id
But as I mentioned above I also need to add subjects and teachers to a section of the class so tried something like below.
SchoolClassSections
---------------------------------------------------------
school_class_id
school_section_id
subject_id
teacher_id
But now I ended up having four foreign keys in the junction table.
Is there no issues in having more than two foreign keys in a junciton table?
Also, I need to add students too, but I can't now figure it out How to move further for students relation for classes, sections?
One thing I could thing of like below
student
-----------
school_id
school_class_section_id // but if SchoolClassSections can have subject_id and teacher_id too it will become confusing

No, it is not an "antipattern". There are many examples where a junction table might have more than two foreign keys:
A doctor's appointment might have a doctor, a patient, and a clinic.
A retail order might have a customer, a location, and a payment type.
An online subscription might have a customer id, product, payment type, and source.
Those are just some examples from other domains (and they do not even include a date dimension). It is not uncommon for a table to have multiple foreign keys.

I didn't analyze your schema, due to time constaints here.
But there's no problem from either the relational point of view or the SQL point of view with a table or junction table having more than two foreign keys. In fact, that's pretty common.

Some concepts, especially OO concepts, do not fit well with SQL.
SQL has "only":
1:1 relationships (generally this should not be used)
1:many (implement via a link in one table to the other)
many:many (requires a separate 'junction' table)
In each case "1" can degenerate to "0" and "many" can degenerate to "1" or "0". Usually "0" is not represented explicitly but by the absence of rows.
To create schemas:
Think of what "entity" tables to have. (Student, Class, School, etc.)
Create relations, thinking in terms of 1:many and many:many.
Optionally, establish FOREIGN KEYs for the links. (For referential integrity; implicitly creates an INDEX)
Make sure the appropriate columns are INDEXed. (For performance)
Later...
Write the SELECTs, UPDATEs, etc.
Use those queries to decide whether other indexes are needed.
Your question is mainly about the many:many tables.
The basic many:many table has exactly 2 column (links to two other tables)
The basic has two composite indexes (PRIMARY KEY(a,b) and INDEX(b,a))
Sure, make it many:many:many (Not common, but reasonable)
Add columns for attributes about the relations (also not common, but OK)
Add composite indexes for the queries that depend on those attributes (for performance)
More discussion of many:many: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table

How would transitive functional dependencies occur here?

so im reading up on database normalization, and it seems like for the most part, a lot of us are already following up to 2NF or even 3NF without realizing it. I wonder why our professor 4 years ago told us thats a topic discussed in masters database course because its "too complicated" lol...sounds straight forward to me really...
anyways, part of this article here, talks about 3NF, and to achieve that, you need to have 2NF and no no transitive functional dependencies.
the example given is this image, but i dont understand...how could a value of a non-key column change a value of another non-key column? if anything, it sounds to me like a glitch in the system if that were to happen...
Consider the table 1. Changing the non-key column Full Name may change Salutation.

That article is terrible. So clearly some people find it difficult to understand normalisation. (Many people struggle with the difference between 3NF vs Boyce-Codd NF, which the article ducks out of explaining.) The article says
Normalization helps produce database systems that are cost-effective and have better security models.
That's not the chief reason for normalising a design. Indeed in the early days of the Relational Model (when disk was expensive, so for example dates had two-digit years), normalisation (i.e. vertical partitioning) was the opposite of cost-effective, and a lot of attention was paid to trade-offs in (partially) denormalised schemas.
The chief reason for normalisation is to avoid duplicating information and/or duplicates getting out of step, called 'update anomalies'. Specifically:
A transitive functional dependency is when changing a non-key column, might cause any of the other non-key columns to change
Is a terrible way to put it. But might mean: when you update one column (Full Name) of one row (Membership Id 3) you need to also update other column(s) (Salutation) of the same or other row(s) (Membership Id 2?); or if you don't, you break the consistency of the data.
The article doesn't tell us what FDs are expected to hold. Does Full Name determine Salutation? Is it possible the Membership ID 3 Robert Phil could qualify as a Doctor and therefore change his Salutation without Member 2 also becoming a Doctor? Then there is no FD from Full Name to Salutation, and what looks like duplicated entry is not.
Presumably what the example is trying to show (I'm not sure, because it's wrong) is that there's a dependency between Full Name and Salutation. Introducing a Salutation Id is so ... stupid, I'm very tempted to say "not even wrong". It has not removed the Transitive Functional Dependency at all.
Normalisation would (assuming there is a FD from Full Name to Salutation):
Put Full Name and Salutation in a separate table, keyed by Full Name -- that represents one of the FDs.
Remove Salutation from the Membership table.
Not introduce a Salutation ID field.
You can recover the original Membership table by joining to the Full Name, Salutation table.
The alleged 3NF form has not removed the Transitive Functional Dependency, and so is not in 3NF. All it has done is replace a Transitive FD from Membership ID to Full Name to Salutation with one from Membership ID to Full Name to Salutation ID. So if Member 3 changes their name from Robert Phil to Roberta Phil, under the initial design Salutation would have to change in step from Mr to Ms; under the alleged 3NF design still the Salutation ID has to change from 1 to 2.
There are other reasons to think that alleged 3NF design is not 3NF. I expect a dependency from Person to Full Name and to Address. There's no column Person, with the consequence there are two Mr Robert Phils. Are they the same person? Then what if they flatted together? The article tries to introduce a composite key {Full Name, Address}, but that won't help; and it's quite common for same-named father and son to live at the same address. We'd have two people with same name at same address. (What if one of them then qualified as a Doctor?)
Normalisation would introduce a Person ID, key to a Person table, with columns Full Name, Salutation, Address. The partitioned Membership table would have columns Membership ID (key), Person ID (Foreign Key references Person).

Normalize two tables with same primary key to 3NF

I have two tables currently with the same primary key, can I have these two tables with the same primary key?
Also are all the tables in 3rd normal form
Ticket:
-------------------
Ticket_id* PK
Flight_name* FK
Names*
Price
Tax
Number_bags
Travel class:
-------------------
Ticket id * PK
Customer_5star
Customer_normal
Customer_2star
Airmiles
Lounge_discount
ticket_economy
ticket_business
ticket_first
food allowance
drink allowance
the rest of the tables in the database are below
Passengers:
Names* PK
Credit_card_number
Credit_card_issue
Ticket_id *
Address
Flight:
Flight_name* PK
Flight_date
Source_airport_id* FK
Dest_airport_id* FK
Source
Destination
Plane_id*
Airport:
Source_airport_id* PK
Dest_airport_id* PK
Source_airport_country
Dest_airport_country
Pilot:
Pilot_name* PK
Plane id* FK
Pilot_grade
Month
Hours flown
Rate
Plane:
Plane_id* PK
Pilot_name* FK

This is not meant as an answer but it became too long for a comment...
Not to sound harsh, but your model has some serious flaws and you should probably take it back to the drawing board.
Consider what would happen if a Passenger buys a second Ticket for instance. The Passenger table should not hold any reference to tickets. Maybe a passenger can have more than one credit card though? Shouldn't Credit Cards be in their own table? The same applies to Addresses.
Why does the Airport table hold information that really is about destinations (or paths/trips)? You already record trip information in the Flights table. It seems to me that the Airport table should hold information pertaining to a particular airport (like name, location?, IATA code et cetera).
Can a Pilot just be associated with one single Plane? Doesn't sound very likely. The pilot table should not hold information about planes.
And the Planes table should not hold information on pilots as a plane surely can be connected to more than one pilot.
And so on... there are most likely other issues too, but these pointers should give you something to think about.
The only tables that sort of looks ok to me are Ticket and Flight.

Re same primary key:
Yes there can be multiple tables with the same primary key. Both in principle and in good practice. We declare a primary or other unique column set to say that those columns (and supersets of them) are unique in a table. When that is the case, declare such column sets. This happens all the time.
Eg: A typical reasonable case is "subtyping"/"subtables", where entities of a kind identified by a candidate key of one table are always or sometimes also of the kind identifed by the same values in another table. (If always then the one table's candidate key values are also in the other table's. And so we would declare a foreign key from the one to the other. We would say the one table's kind of entity is a subtype of the other's.) On the other hand sometimes one table is used with attributes of both kinds and attributes inapplicable to one kind are not used. (Ie via NULL or a tag indicating kind.)
Whether you should have cases of the same primary key depends on other criteria for good design as applied to your particular situation. You need to learn design including normalization.
Eg: All keys simple and 3NF implies 5NF, so if your two tables have the same set of values as only & simple primary key in every state and they are both in 3NF then their join contains exactly the same information as they do separately. Still, maybe you would keep them separate for clarity of design, for likelihood of change or for performance based on usage. You didn't give that information.
Re normal forms:
Normal forms apply to tables. The highest normal form of a table is a property independent of any other table. (Athough you might choose that form based on what forms & tables are alternatives.)
In order to normalize or determine a table's highest normal form one needs to know (in general) all the functional dependencies in it. (For normal forms above BCNF, also join dependencies.) You didn't give them. They are determined by what the meaning of the table is (ie how to determine what rows go in it in any given situation) and the possible situtations that can arise. You didn't give them. Your expectation that we could tell you about the normal forms your tables are in without giving such information suggests that you do not understand normalization and need to educate yourself about it.
Proper design also needs this information and in general all valid states that can arise from situations that arise. Ie constraints among given tables. You didn't give them.

Having two tables with the same key goes against the idea of removing redundancy in normalization.
Excluding that, are these tables in 1NF and 2NF?
Judging by the Names field, I'd suggest that table1 is not. If multiple names can belong to one ticket, then you need a new table, most likely with a composite key of ticket_id,name.

Type-based database design

I want to rate (Very Good, Good, Medium, Bad...) 2 objects: Student and Teacher. Which design solution is better?
Solution 1:
Students(StudentID, Rating,...)
-----------------1--------Good-----
-----------------2--------Bad-----
-----------------3-----Very Good-----
Teachers(TeacherID, Rating,...)
-----------------1-----Very Good-----
-----------------2--------Bad-----
-----------------3--------Bad-----
Solution 2:
Students(StudentID, RatingTypeID,...)
-----------------1----------------2----------
-----------------2----------------1----------
-----------------3----------------3----------
Teachers(TeacherID, RatingTypeID,...)
-----------------1----------------1----------
-----------------2----------------1----------
-----------------3----------------3----------
RatingType(RatingID, RatingDescription,...)
-----------------1-------------Very Good---------
-----------------2---------------Good--------------
-----------------3----------------Bad---------------
If both of them are not good enough, can you give me some suggestions? Thanks!

Solution 2 is the best from those two, but if You want to have multiple votes in the course of your application's life, you should have a table where you identify what is the subject of the vote, and another where you store the votes:
Student:
id
name
class
...
Teacher:
id
name
subject
...
voterType
id
description (student or teacher)
contest:
id
description (ex: 1st Semester 2013)
contestVotes:
id
contestId
voterType (Teacher or Student)
voterId
ratingTypeId

Between the two suggested solutions, I would definitely choose the second one, for the following reasons:
Storing a Rating as a number can come handy in cases where ordering or aggregates are involved, in which cases a string is pretty much useless.
Making the Rating column a Foreign Key to the RatingType.RatingID enforces a constraint, that a Rating can only have values from a very specific set, with very specific meaning. Plus, you could potentially add extra columns to the RatingType table in the future, adding value to your ratings.
As for an improvement suggestion, since you asked, consider implementing this using IS-A relations. A teacher, as well as a student, are apparently both Persons and have common attributes to an extent. I would create a Person superclass table containing the common attributes (i.e. first name, last name, address, phone etc) and then make the primary keys of students and teachers also foreign keys to Persons.
Notice that, a rating is a common attribute too. So it's only natural to appear in the Persons table. Unless of course there are different types of ratings for teachers and students, in which case you would have to implement two different RatingType tables.

Normalization in MYSQL

What is normalization in MySQL and in which case and how we need to use it?

I try to attempt to explain normalization in layman terms here. First off, it is something that applies to relational database (Oracle, Access, MySQL) so it is not only for MySQL.
Normalisation is about making sure each table has the only minimal fields and to get rid of dependencies. Imagine you have an employee record, and each employee belongs to a department. If you store the department as a field along with the other data of the employee, you have a problem - what happens if a department is removed? You have to update all the department fields, and there's opportunity for error. And what if some employees does not have a department (newly assigned, perhaps?). Now there will be null values.
So the normalisation, in brief, is to avoid having fields that would be null, and making sure that the all the fields in the table only belong to one domain of data being described. For example, in the employee table, the fields could be id, name, social security number, but those three fields have nothing to do with the department. Only employee id describes which department the employee belongs to. So this implies that which department an employee is in should be in another table.
Here's a simple normalization process.
EMPLOYEE ( < employee_id >, name, social_security, department_name)
This is not normalized, as explained. A normalized form could look like
EMPLOYEE ( < employee_id >, name, social_security)
Here, the Employee table is only responsible for one set of data. So where do we store which department the employee belongs to? In another table
EMPLOYEE_DEPARTMENT ( < employee_id >, department_name )
This is not optimal. What if the department name changes? (it happens in the US government all the time). Hence it is better to do this
EMPLOYEE_DEPARTMENT ( < employee_id >, department_id )
DEPARTMENT ( < department_id >, department_name )
There are first normal form, second normal form and third normal form. But unless you are studying a DB course, I usually just go for the most normalized form I could understand.

Normalization is not for MYSql only. Its a general database concept.
Normalization is the process of
efficiently organizing data in a
database. There are two goals of the
normalization process: eliminating
redundant data (for example, storing
the same data in more than one table)
and ensuring data dependencies make
sense (only storing related data in a
table). Both of these are worthy goals
as they reduce the amount of space a
database consumes and ensure that data
is logically stored.
Normal forms in SQL are given below.
First Normal form (1NF): A relation is
said to be in 1NF if it has only
single valued attributes, neither
repeating nor arrays are permitted.
Second Normal Form (2NF): A relation
is said to be in 2NF if it is in 1NF
and every non key attribute is fully
functional dependent on the primary
key.
Third Normal Form (3NF): We say that a
relation is in 3NF if it is in 2NF and
has no transitive dependencies.
Boyce-Codd Normal Form (BCNF): A
relation is said to be in BCNF if and
only if every determinant in the
relation is a candidate key.
Fourth Normal Form (4NF): A relation
is said to be in 4NF if it is in BCNF
and contains no multivalued dependency.
Fifth Normal Form (5NF): A relation is
said to be in 5NF if and only if every
join dependency in relation is implied
by the candidate keys of relation.
Domain-Key Normal Form (DKNF): We say
that a relation is in DKNF if it is
free of all modification anomalies.
Insertion, Deletion, and update
anomalies come under modification
anomalies
Seel also
Database Normalization Basics

It's a technique for ensuring that your data remains consistent, by eliminating duplication. So a database in which the same information is stored in more than one table is not normalized.
See the Wikipedia article on Database normalization.
(It's a general technique for relational databases, not specific to MySQL.)

While creating a database schema for your application, you need to make sure that you avoid any information being stored in more than one column across different tables.
As every table in your DB, identifies a significant entity in your application, a unique identifier is a must-have columns for them.
Now, while deciding the storage schema, various kinds of relationships are being identified between these entities (tables), viz-a-viz, one-to-one, one-to-many, many-to-many.
For a one-to-one relationship (eg. A
Student has a unique rank in the
class), same table could be used to
store columns (from both tables).
For a one-to-many relationship (eg.
A semester can have multiple
courses), a foreign key is being
created in a parent table.
For a many-to-many relationship (eg.
A Prof. attends to many students and
vice-versa), a third table needs to
be created (with primary key from
both tables as a composite key), and
related data of the both tables will
be stored.
Once you attend to all these scenarios, your db-schema will be normalized to 4NF.

In the field of relational database
design, normalization is a systematic
way of ensuring that a database
structure is suitable for
general-purpose querying and free of
certain undesirable
characteristics—insertion, update, and
deletion anomalies—that could lead to
a loss of data integrity.[1] E.F.
Codd, the inventor of the relational
model, introduced the concept of
normalization and what we now know as
the first normal form in 1970.[2] Codd
went on to define the second and third
normal forms in 1971,[3] and Codd and
Raymond F. Boyce defined the
Boyce-Codd normal form in 1974.[4]
Higher normal forms were defined by
other theorists in subsequent years,
the most recent being the sixth normal
form introduced by Chris Date, Hugh
Darwen, and Nikos Lorentzos in
2002.[5]
Informally, a relational database
table (the computerized representation
of a relation) is often described as
"normalized" if it is in the third
normal form (3NF).[6] Most 3NF tables
are free of insertion, update, and
deletion anomalies, i.e. in most cases
3NF tables adhere to BCNF, 4NF, and
5NF (but typically not 6NF).
A standard piece of database design
guidance is that the designer should
create a fully normalized design;
selective denormalization can
subsequently be performed for
performance reasons.[7] However, some
modeling disciplines, such as the
dimensional modeling approach to data
warehouse design, explicitly recommend
non-normalized designs, i.e. designs
that in large part do not adhere to
3NF.[8]
Edit: Source: http://en.wikipedia.org/wiki/Database_normalization

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008