Normalization in MYSQL - mysql

What is normalization in MySQL and in which case and how we need to use it?

I try to attempt to explain normalization in layman terms here. First off, it is something that applies to relational database (Oracle, Access, MySQL) so it is not only for MySQL.
Normalisation is about making sure each table has the only minimal fields and to get rid of dependencies. Imagine you have an employee record, and each employee belongs to a department. If you store the department as a field along with the other data of the employee, you have a problem - what happens if a department is removed? You have to update all the department fields, and there's opportunity for error. And what if some employees does not have a department (newly assigned, perhaps?). Now there will be null values.
So the normalisation, in brief, is to avoid having fields that would be null, and making sure that the all the fields in the table only belong to one domain of data being described. For example, in the employee table, the fields could be id, name, social security number, but those three fields have nothing to do with the department. Only employee id describes which department the employee belongs to. So this implies that which department an employee is in should be in another table.
Here's a simple normalization process.
EMPLOYEE ( < employee_id >, name, social_security, department_name)
This is not normalized, as explained. A normalized form could look like
EMPLOYEE ( < employee_id >, name, social_security)
Here, the Employee table is only responsible for one set of data. So where do we store which department the employee belongs to? In another table
EMPLOYEE_DEPARTMENT ( < employee_id >, department_name )
This is not optimal. What if the department name changes? (it happens in the US government all the time). Hence it is better to do this
EMPLOYEE_DEPARTMENT ( < employee_id >, department_id )
DEPARTMENT ( < department_id >, department_name )
There are first normal form, second normal form and third normal form. But unless you are studying a DB course, I usually just go for the most normalized form I could understand.

Normalization is not for MYSql only. Its a general database concept.
Normalization is the process of
efficiently organizing data in a
database. There are two goals of the
normalization process: eliminating
redundant data (for example, storing
the same data in more than one table)
and ensuring data dependencies make
sense (only storing related data in a
table). Both of these are worthy goals
as they reduce the amount of space a
database consumes and ensure that data
is logically stored.
Normal forms in SQL are given below.
First Normal form (1NF): A relation is
said to be in 1NF if it has only
single valued attributes, neither
repeating nor arrays are permitted.
Second Normal Form (2NF): A relation
is said to be in 2NF if it is in 1NF
and every non key attribute is fully
functional dependent on the primary
key.
Third Normal Form (3NF): We say that a
relation is in 3NF if it is in 2NF and
has no transitive dependencies.
Boyce-Codd Normal Form (BCNF): A
relation is said to be in BCNF if and
only if every determinant in the
relation is a candidate key.
Fourth Normal Form (4NF): A relation
is said to be in 4NF if it is in BCNF
and contains no multivalued dependency.
Fifth Normal Form (5NF): A relation is
said to be in 5NF if and only if every
join dependency in relation is implied
by the candidate keys of relation.
Domain-Key Normal Form (DKNF): We say
that a relation is in DKNF if it is
free of all modification anomalies.
Insertion, Deletion, and update
anomalies come under modification
anomalies
Seel also
Database Normalization Basics

It's a technique for ensuring that your data remains consistent, by eliminating duplication. So a database in which the same information is stored in more than one table is not normalized.
See the Wikipedia article on Database normalization.
(It's a general technique for relational databases, not specific to MySQL.)

While creating a database schema for your application, you need to make sure that you avoid any information being stored in more than one column across different tables.
As every table in your DB, identifies a significant entity in your application, a unique identifier is a must-have columns for them.
Now, while deciding the storage schema, various kinds of relationships are being identified between these entities (tables), viz-a-viz, one-to-one, one-to-many, many-to-many.
For a one-to-one relationship (eg. A
Student has a unique rank in the
class), same table could be used to
store columns (from both tables).
For a one-to-many relationship (eg.
A semester can have multiple
courses), a foreign key is being
created in a parent table.
For a many-to-many relationship (eg.
A Prof. attends to many students and
vice-versa), a third table needs to
be created (with primary key from
both tables as a composite key), and
related data of the both tables will
be stored.
Once you attend to all these scenarios, your db-schema will be normalized to 4NF.

In the field of relational database
design, normalization is a systematic
way of ensuring that a database
structure is suitable for
general-purpose querying and free of
certain undesirable
characteristics—insertion, update, and
deletion anomalies—that could lead to
a loss of data integrity.[1] E.F.
Codd, the inventor of the relational
model, introduced the concept of
normalization and what we now know as
the first normal form in 1970.[2] Codd
went on to define the second and third
normal forms in 1971,[3] and Codd and
Raymond F. Boyce defined the
Boyce-Codd normal form in 1974.[4]
Higher normal forms were defined by
other theorists in subsequent years,
the most recent being the sixth normal
form introduced by Chris Date, Hugh
Darwen, and Nikos Lorentzos in
2002.[5]
Informally, a relational database
table (the computerized representation
of a relation) is often described as
"normalized" if it is in the third
normal form (3NF).[6] Most 3NF tables
are free of insertion, update, and
deletion anomalies, i.e. in most cases
3NF tables adhere to BCNF, 4NF, and
5NF (but typically not 6NF).
A standard piece of database design
guidance is that the designer should
create a fully normalized design;
selective denormalization can
subsequently be performed for
performance reasons.[7] However, some
modeling disciplines, such as the
dimensional modeling approach to data
warehouse design, explicitly recommend
non-normalized designs, i.e. designs
that in large part do not adhere to
3NF.[8]
Edit: Source: http://en.wikipedia.org/wiki/Database_normalization

Related

What should be the DB structure for a application with multiple accounts having similar type of data for each a/c?

I am working on creating an application with multiple parent accounts each of which has different multiple users. Each account consists of a set of data of similar type but needs t be maintained separately. eg. inventory of each organization which their respective users can view.
What is the best practice:
1: Create different database tables for each organization
2: Create a common table and have an extra column for the organization it belongs to.
As mentioned, do a one table for organization, one for equipment, one for persons and so on. It is step 1 - separate table for separate entity.
After that connect them with relationships: primary key in main entity to foreign key in sub entity. Other words every row in equipment table would have column with id of organization it belongs to. And so on.
There are many other circumstances, including subdividing entities to such called normal forms, you can study it if it needed, to reduce data consistency supply costs. But it could also negatively affect performance.
Anyway: same class entities commonly should be stored in one table.
The best practice in OLTP (transaction processing) is to create a common table and to implement a subtyping in some way, for example "have extra tables with columns for the organization subtype". In OLAP (analytical processing) warehousing it is still a good practice but the mapping of subtypes can be implemented differently. In OLAP datamarts the solution "one table per organization" can be a good practice.
You may have a look on the book "Programming with databases" which covers these topics: subtype/subclass mapping, OLTP vs OLAP, denormalization etc.

MySQL Database Layout/Modelling/Design Approach / Relationships

Scenario: Multiple Types to a single type; one to many.
So for example:
parent multiple type: students table, suppliers table, customers table, hotels table
child single type: banking details
So a student may have multiple banking details, as can a supplier, etc etc.
Layout Option 1 students table (id) + students_banking_details (student_id) table with the appropriate id relationship, repeat per parent type.
Layout Option 2 students table (+others) + banking_details table. banking_details would have a parent_id column for linking and a parent_type field for determining what the parent is (student / supplier / customers etc).
Layout Option 3 students table (+others) + banking_details table. Then I would create another association table per parent type (eg: students_banking_details) for the linking of student_id and banking_details_id.
Layout Option 4 students table (+others) + banking_details table. banking_details would have a column for each parent type, ie: student_id, supplier_id, customers_id - etc.
Other? Your input...
My thoughts on each of these:
Multiple tables of the same type of information seems wrong. If I want to change what gets stored about banking details, thats also several tables I have to change as opposed to one.
Seems like the most viable option. Apparently this doesnt maintain 'referential integrity' though. I don't know how important that is to me if I'm just going to be cleaning up children programatically when I delete the parents?
Same as (2) except with an extra table per type so my logic tells me this would be slower than (2) with more tables and with the same outcome.
Seems dirty to me with a bunch of null fields in the banking_details table.
Before going any further: if you do decide on a design for storing banking details which lacks referential integrity, please tell me who's going to be running it so I can never, ever do business with them. It's that important. Constraints in your application logic may be followed; things happen, exceptions, interruptions, inconsistencies which are later reflected in data because there aren't meaningful safeguards. Constraints in your schema design must be followed. Much safer, and banking data is something to be as safe as possible with.
You're correct in identifying #1 as suboptimal; an account is an account, no matter who owns it. #2 is out because referential integrity is non-negotiable. #3 is, strictly speaking, the most viable approach, although if you know you're never going to need to worry about expanding the number of entities who might have banking details, you could get away with #4 and a CHECK constraint to ensure that each row only has a value for one of the four foreign keys -- but you're using MySQL, which ignores CHECK constraints, so go with #3.
Index your foreign keys and performance will be fine. Views are nice to avoid boilerplate JOINs if you have a need to do that.

Does data redundancy in different tables not follow Third Normal Form (3NF)?

I have 4 tables. Each of them contain the following attributes:
Table 1 :
Person (Id (Primary key), Name, Occupation, Location, SecondJob, PerHour, HoursWorked, Phone, Workphone)
Table 2 :
Job (Id (Foreign key that refers to Person), Title, Name, Location, Salary)
Table 3 :
SecondJob (Id (Foreign key that refers to Person), Title, Name)
Table 4:
PhoneNumber (Id (Foreign key that refers to Person), Name, Phone, Workphone)
I can obtain the values of each attribute like Name, Title, Phone and Workphone from the Person table with the following psuedo SQL statement:
Select (ATTRIBUTE NAME) FROM Person WHERE Id IN (PERSONS ID)
Does the fact that some of the information is being repeated in DIFFERENT TABLES (Data Redundancy), break (ie, not follow) the Third Normal Form (3NF)?
Or should the values be put into the other Tables separately and reason what attribute is identifying with the Primary Key of the Table?
I calculate Salary in Job by getting PerHour and HoursWorked from Person, then multiply them. I have also heard that this is redundant Data, due to the fact that is is data that you could extrapolate from existing Data within the Tables.
But, does this break the Third Normal Form??
Does the fact that information is repeated in DIFFERENT TABLES (Data Redundancy), break against 3NF Normalization?
No. A table value or variable is or isn't in a given NF. This is independent of any other table. (We do also talk about a database being in NF when all of its tables are in that NF.)
Normalization can be reasonably said to remove redundancy. But there is lots of redundancy not addressed by normalization. And there is lots of redundancy that is not bad. And duplication is not necessarily redundancy. Just because data is repeated doesn't mean "information" is repeated. What data says by being or not being in a table depends on the meaning of the table.
But you seem to think that just because duplicating data in a different table doesn't violate 3NF that it doesn't violate other principles of good design. That's wrong. Also, it's 5NF that matters. The only reason lower NFs are used is that SQL DBMSs don't support 5NF well.
Or should i just put in the values into the other Tables seperately and reason what attribute is identifying with the Primary Key of the Table?
I guess you are trying to say, Should I only put the values in one table each and reconstruct the second table via queries involving shared keys? Ie, if you can get the values in a column by querying the rest of the database then should you avoid having that column? Generally speaking, yes.
Your question assumes a misconception. It's not a matter of "(exclusive) or" here. You should do both.
I calculate Salary in Job by getting PerHour and HoursWorked from Person, then multiply them. I heard that this is also redundant Data, due to it being data that you could extrapulate from existing Data in the Tables.
It is redundant given the rest of the database, because you could use a query instead. And if you don't constrain salary values appropriately then that is bad redundancy. Even if you do the column and constraint complicate the schema.
But does it break 3NF Normalization?
No, because the NF of a table is independent of other tables. But that doesn't mean it's ok.
(If you added Salary to Person, the new table would not be in 3NF. But then, SQL DBMSs have computed columns that make that ok, by making the non-3NF table with Salary a view of the 3NF table without it.)
Learn some database design method(s) and how they apply principles of good design. Your tables needlessly address overlapping aspects of the application. Also learn about JOIN in writing queries.

Functional dependency in another table

Lets say there are warehouses each storing items of a specific type.
So there are tables with fields
Warehouse - ID,Name,Type
Item - ID,Name,Type
WarehouseItem - Warehouse, Item
Type - ID, Name
The question is - given that a Warehouse only holds Items with of specific Type, what database normalization rule is this breaking?
Is this database normalized?
(The problem's example is made up, but I basically have this problem in real life.)
I'm making some assumptions from just looking at your metadata without any data examples, but on first glance it appears that your schema for the most part is normalized. Technically speaking your table is 3NF (which should be your target) if it meets all of the following standards:
It is also 1NF - Each entry only contains atomic data (or a single piece of info)
It is also 2NF - No candidate key dependency meaning that when you have have a composite primary key (a key made up of more than one column) that all data is dependent on the entire key
It is 3NF - No transitive dependency meaning all data is only dependent on the primary key and not some other column in the table
Note that there are also higher normalized forms but they are mostly academic as you begin experiencing performance degradation the more you normalize
Given this definition:
Warehouse appears 3NF assuming that each warehouse can only have one Type. If not then you would be failing the transitive dependency and would need to move Type information to a new table.
Item too appears 3NF assuming only one Type can be assigned
Type appears to contain redundant data and should be removed unless of course you have a many-to-many relationship between Type and Warehouse and/or Item. In that case, you would want to introduce a bridge-entity (aka composite entry) between Type and Warehouse or Item to create two 1-to-many relationships.
Lastly, if I'm reading this correctly, WarehouseItem appears to be a bridge-entity between Warehouse and Item to break up the many-to-many relationship between them. If this is correct, you should be able to argue that this table is 3NF assuming the combination of Warehouse and Item represent a composite key.
So assuming I interpreted your schema correctly, once you eliminate the redundant Type table, then yes I would say this setup technically meets 3NF. Note that your requirement that
given that a Warehouse only holds Items with of specific Type
may require you introduce a new type field which will mean you need to reevaluate your normalization of that table. If you have two distinct types (a WarehouseType and an ItemType) then you may need to keep that Type table after all and turn it into a mapping table between those two new fields. But I'd need to see data examples to better evaluate.

Normalize two tables with same primary key to 3NF

I have two tables currently with the same primary key, can I have these two tables with the same primary key?
Also are all the tables in 3rd normal form
Ticket:
-------------------
Ticket_id* PK
Flight_name* FK
Names*
Price
Tax
Number_bags
Travel class:
-------------------
Ticket id * PK
Customer_5star
Customer_normal
Customer_2star
Airmiles
Lounge_discount
ticket_economy
ticket_business
ticket_first
food allowance
drink allowance
the rest of the tables in the database are below
Passengers:
Names* PK
Credit_card_number
Credit_card_issue
Ticket_id *
Address
Flight:
Flight_name* PK
Flight_date
Source_airport_id* FK
Dest_airport_id* FK
Source
Destination
Plane_id*
Airport:
Source_airport_id* PK
Dest_airport_id* PK
Source_airport_country
Dest_airport_country
Pilot:
Pilot_name* PK
Plane id* FK
Pilot_grade
Month
Hours flown
Rate
Plane:
Plane_id* PK
Pilot_name* FK
This is not meant as an answer but it became too long for a comment...
Not to sound harsh, but your model has some serious flaws and you should probably take it back to the drawing board.
Consider what would happen if a Passenger buys a second Ticket for instance. The Passenger table should not hold any reference to tickets. Maybe a passenger can have more than one credit card though? Shouldn't Credit Cards be in their own table? The same applies to Addresses.
Why does the Airport table hold information that really is about destinations (or paths/trips)? You already record trip information in the Flights table. It seems to me that the Airport table should hold information pertaining to a particular airport (like name, location?, IATA code et cetera).
Can a Pilot just be associated with one single Plane? Doesn't sound very likely. The pilot table should not hold information about planes.
And the Planes table should not hold information on pilots as a plane surely can be connected to more than one pilot.
And so on... there are most likely other issues too, but these pointers should give you something to think about.
The only tables that sort of looks ok to me are Ticket and Flight.
Re same primary key:
Yes there can be multiple tables with the same primary key. Both in principle and in good practice. We declare a primary or other unique column set to say that those columns (and supersets of them) are unique in a table. When that is the case, declare such column sets. This happens all the time.
Eg: A typical reasonable case is "subtyping"/"subtables", where entities of a kind identified by a candidate key of one table are always or sometimes also of the kind identifed by the same values in another table. (If always then the one table's candidate key values are also in the other table's. And so we would declare a foreign key from the one to the other. We would say the one table's kind of entity is a subtype of the other's.) On the other hand sometimes one table is used with attributes of both kinds and attributes inapplicable to one kind are not used. (Ie via NULL or a tag indicating kind.)
Whether you should have cases of the same primary key depends on other criteria for good design as applied to your particular situation. You need to learn design including normalization.
Eg: All keys simple and 3NF implies 5NF, so if your two tables have the same set of values as only & simple primary key in every state and they are both in 3NF then their join contains exactly the same information as they do separately. Still, maybe you would keep them separate for clarity of design, for likelihood of change or for performance based on usage. You didn't give that information.
Re normal forms:
Normal forms apply to tables. The highest normal form of a table is a property independent of any other table. (Athough you might choose that form based on what forms & tables are alternatives.)
In order to normalize or determine a table's highest normal form one needs to know (in general) all the functional dependencies in it. (For normal forms above BCNF, also join dependencies.) You didn't give them. They are determined by what the meaning of the table is (ie how to determine what rows go in it in any given situation) and the possible situtations that can arise. You didn't give them. Your expectation that we could tell you about the normal forms your tables are in without giving such information suggests that you do not understand normalization and need to educate yourself about it.
Proper design also needs this information and in general all valid states that can arise from situations that arise. Ie constraints among given tables. You didn't give them.
Having two tables with the same key goes against the idea of removing redundancy in normalization.
Excluding that, are these tables in 1NF and 2NF?
Judging by the Names field, I'd suggest that table1 is not. If multiple names can belong to one ticket, then you need a new table, most likely with a composite key of ticket_id,name.