Normalization suggestion - mysql

I am going to create a table with fields like ID, Name, Email ID, Phone, Designation, Department and Institute. In this regard, I believe making the fields like Designation, Department and Institute as foreign key is an efficient one (as it have redundant values). Am I correct? or suggest some better ideas.

Yes you are correct, redundancy is one of the things you should avoid. And i encourage you to read more about normalization because you will find more things that will not only make your database more efficient, but easier to use / understand when it grows and gets complicated.
And as the example in the reply above, you should also pay attention to the tables / columns names.
So instead of
Designation, Department and Institute as foreign key
You should follow a pattern and name them like table name + id, so you know just by looking at it that it is a foreign key and it's referencing that table.

Related

Does MySQL database require a unique identifier?

I'm really new to databases in general but could use some advice. I have a database that has about 6000 records (and growing but not crazy amounts). I'd like to build an API so that I can retrieve a property's price history, but have been advised I need a unique ID but I'm not so sure. Can anyone advise?
DB looks like this:
address
price
date_created
status
Address one, main street
£150,000
13/10/2022
new data
Address one, main street
£140,000
16/10/2022
update data
Address two, side road
£350,000
13/10/2022
new data
Maybe you will need to add/edit your data using a CRUD, maybe reference other tables, make foreign keys, etc. I recommend that you add a primary key. I've never worked on tables without a primary key.
I can hear my database teacher in 2001: you should always have a primary key and another unique key apart from the primary one (compound or on a single column) if you don't want your table to be just another excel sheet. And I remember that it was a mathematical demonstration of this rule with injective functions.
Of course, you can ignore these rules, but they are best practice advice for your future self :)
You might find an integer primary key useful, but it's not mandatory.
If the address is sufficient to be the unique column by which you can reference any row, then that's fine. It's called a "natural key." In practice, most of us have experienced that any column you think should be unique ends up not being 100% unique eventually. So a lot of developers recommend adding a "pseudokey" which has no reason not to be unique.
I wrote a chapter called "ID Required" describing the pros and cons of pseudokeys and natural keys in my book SQL Antipatterns, Volume 1: Avoiding the Pitfalls of Database Programming.

Does data redundancy in different tables not follow Third Normal Form (3NF)?

I have 4 tables. Each of them contain the following attributes:
Table 1 :
Person (Id (Primary key), Name, Occupation, Location, SecondJob, PerHour, HoursWorked, Phone, Workphone)
Table 2 :
Job (Id (Foreign key that refers to Person), Title, Name, Location, Salary)
Table 3 :
SecondJob (Id (Foreign key that refers to Person), Title, Name)
Table 4:
PhoneNumber (Id (Foreign key that refers to Person), Name, Phone, Workphone)
I can obtain the values of each attribute like Name, Title, Phone and Workphone from the Person table with the following psuedo SQL statement:
Select (ATTRIBUTE NAME) FROM Person WHERE Id IN (PERSONS ID)
Does the fact that some of the information is being repeated in DIFFERENT TABLES (Data Redundancy), break (ie, not follow) the Third Normal Form (3NF)?
Or should the values be put into the other Tables separately and reason what attribute is identifying with the Primary Key of the Table?
I calculate Salary in Job by getting PerHour and HoursWorked from Person, then multiply them. I have also heard that this is redundant Data, due to the fact that is is data that you could extrapolate from existing Data within the Tables.
But, does this break the Third Normal Form??
Does the fact that information is repeated in DIFFERENT TABLES (Data Redundancy), break against 3NF Normalization?
No. A table value or variable is or isn't in a given NF. This is independent of any other table. (We do also talk about a database being in NF when all of its tables are in that NF.)
Normalization can be reasonably said to remove redundancy. But there is lots of redundancy not addressed by normalization. And there is lots of redundancy that is not bad. And duplication is not necessarily redundancy. Just because data is repeated doesn't mean "information" is repeated. What data says by being or not being in a table depends on the meaning of the table.
But you seem to think that just because duplicating data in a different table doesn't violate 3NF that it doesn't violate other principles of good design. That's wrong. Also, it's 5NF that matters. The only reason lower NFs are used is that SQL DBMSs don't support 5NF well.
Or should i just put in the values into the other Tables seperately and reason what attribute is identifying with the Primary Key of the Table?
I guess you are trying to say, Should I only put the values in one table each and reconstruct the second table via queries involving shared keys? Ie, if you can get the values in a column by querying the rest of the database then should you avoid having that column? Generally speaking, yes.
Your question assumes a misconception. It's not a matter of "(exclusive) or" here. You should do both.
I calculate Salary in Job by getting PerHour and HoursWorked from Person, then multiply them. I heard that this is also redundant Data, due to it being data that you could extrapulate from existing Data in the Tables.
It is redundant given the rest of the database, because you could use a query instead. And if you don't constrain salary values appropriately then that is bad redundancy. Even if you do the column and constraint complicate the schema.
But does it break 3NF Normalization?
No, because the NF of a table is independent of other tables. But that doesn't mean it's ok.
(If you added Salary to Person, the new table would not be in 3NF. But then, SQL DBMSs have computed columns that make that ok, by making the non-3NF table with Salary a view of the 3NF table without it.)
Learn some database design method(s) and how they apply principles of good design. Your tables needlessly address overlapping aspects of the application. Also learn about JOIN in writing queries.

How can I join two tables with two different primary keys into another table?

I have two tables: students and courses, assuming that each student can be in more than one course and that each course can have more than one student.
[Table Students] [Table Courses]
id(PK) id(PK)
name name
age duration
etc... etc...
and what I want to do it is to relate both tables into another table, for example, studying, in which I will store the course or courses that is doing each student. Like this:
[Table studying]
idStudent
idCourse
What I have deduced
I think that idStudent and idCourse should be foreign keys because the information it is stored in students and courses respectively with an unique primary key and to respect the consistency of the database. It cannot exist a relation without information neither of the student nor the course or just without the information of one of them.
I also know that some tables has two primary keys to allow that in the table could exist more than one repeated value of a primary key, but not of both primary keys at the same time.
My questions
These ids (idStudent, idCourse). Have to be primary keys or foreign keys?
Should the table studying has another column with an ID?
Is my deduction in the good way?
P.S: I do not need sql statements, I just need help to clarify my confusion.
Thanks in advance!
These ids (idStudent, idCourse). Have to be primary keys or foreign keys?
You want them to be foreign keys, because the existence of each record on your third table depends on the availability of the first, that is, there cannot be a "Student Course" or a "Course with Students" without either the course or the student. It could (if you don't make those keys) but you would break referential integrity
On the other hand, having FK's is usually a good thing because you make sure that you don't remove dependable records by mistake (which is what the constraint is for on the first place) unless you did something like cascade deleting
Should the table studying has another column with an ID?
No, it does not have to but again, sometimes it is a good practice because some software like Object Relational Mappers, Diagram Software, etc. may rely on the fact that they always needs a by-convention primary key. Some others don't even support composite keys so while it is not mandatory it can help in the future and it does not hurt. Of course this all depends on what you are using the database for and how (pure SQL, which engine you use, if you use it with a framework etc.)
Is my deduction in the good way?
All is relative. But I think your logic is good. My advice is that you always design your data schemas as flexible as you can because if a project grows its harder (and more costly) to do those changes down the road. Invest time on thinking how you may expand your application functionality and think if the schema will adapt to it.
Your deduction is correct.
In fact, you should have a composite primary key consisting of both (idStudent, idCourse) columns, because this tuple is the identifier of row in the table, you do not need additional ID column (of course, you can also take that approach to add additional ID column that would be your primary key, but you do not need it if one student can have one course assigned only once)
To respect the integrity, both columns (separately) should be foreign keys - idStudent should be referencing id column of Students table and idCourse should reference id column of Courses table.
If you like you can make them primary keys on studying table. But this is unnecesary, because relation (role of studying table) is many to many and this kind of table dont need primary keys. You need to know that also when you make them pk (pair of student id and course id) , thats mean that theee could be only one pair of each, thats equivalent to constrain unique - student can take a course only ones. In the future you maybe would like to add to this table start_date and this kind of pk could be a problem, you will need to modify them.

Normalize two tables with same primary key to 3NF

I have two tables currently with the same primary key, can I have these two tables with the same primary key?
Also are all the tables in 3rd normal form
Ticket:
-------------------
Ticket_id* PK
Flight_name* FK
Names*
Price
Tax
Number_bags
Travel class:
-------------------
Ticket id * PK
Customer_5star
Customer_normal
Customer_2star
Airmiles
Lounge_discount
ticket_economy
ticket_business
ticket_first
food allowance
drink allowance
the rest of the tables in the database are below
Passengers:
Names* PK
Credit_card_number
Credit_card_issue
Ticket_id *
Address
Flight:
Flight_name* PK
Flight_date
Source_airport_id* FK
Dest_airport_id* FK
Source
Destination
Plane_id*
Airport:
Source_airport_id* PK
Dest_airport_id* PK
Source_airport_country
Dest_airport_country
Pilot:
Pilot_name* PK
Plane id* FK
Pilot_grade
Month
Hours flown
Rate
Plane:
Plane_id* PK
Pilot_name* FK
This is not meant as an answer but it became too long for a comment...
Not to sound harsh, but your model has some serious flaws and you should probably take it back to the drawing board.
Consider what would happen if a Passenger buys a second Ticket for instance. The Passenger table should not hold any reference to tickets. Maybe a passenger can have more than one credit card though? Shouldn't Credit Cards be in their own table? The same applies to Addresses.
Why does the Airport table hold information that really is about destinations (or paths/trips)? You already record trip information in the Flights table. It seems to me that the Airport table should hold information pertaining to a particular airport (like name, location?, IATA code et cetera).
Can a Pilot just be associated with one single Plane? Doesn't sound very likely. The pilot table should not hold information about planes.
And the Planes table should not hold information on pilots as a plane surely can be connected to more than one pilot.
And so on... there are most likely other issues too, but these pointers should give you something to think about.
The only tables that sort of looks ok to me are Ticket and Flight.
Re same primary key:
Yes there can be multiple tables with the same primary key. Both in principle and in good practice. We declare a primary or other unique column set to say that those columns (and supersets of them) are unique in a table. When that is the case, declare such column sets. This happens all the time.
Eg: A typical reasonable case is "subtyping"/"subtables", where entities of a kind identified by a candidate key of one table are always or sometimes also of the kind identifed by the same values in another table. (If always then the one table's candidate key values are also in the other table's. And so we would declare a foreign key from the one to the other. We would say the one table's kind of entity is a subtype of the other's.) On the other hand sometimes one table is used with attributes of both kinds and attributes inapplicable to one kind are not used. (Ie via NULL or a tag indicating kind.)
Whether you should have cases of the same primary key depends on other criteria for good design as applied to your particular situation. You need to learn design including normalization.
Eg: All keys simple and 3NF implies 5NF, so if your two tables have the same set of values as only & simple primary key in every state and they are both in 3NF then their join contains exactly the same information as they do separately. Still, maybe you would keep them separate for clarity of design, for likelihood of change or for performance based on usage. You didn't give that information.
Re normal forms:
Normal forms apply to tables. The highest normal form of a table is a property independent of any other table. (Athough you might choose that form based on what forms & tables are alternatives.)
In order to normalize or determine a table's highest normal form one needs to know (in general) all the functional dependencies in it. (For normal forms above BCNF, also join dependencies.) You didn't give them. They are determined by what the meaning of the table is (ie how to determine what rows go in it in any given situation) and the possible situtations that can arise. You didn't give them. Your expectation that we could tell you about the normal forms your tables are in without giving such information suggests that you do not understand normalization and need to educate yourself about it.
Proper design also needs this information and in general all valid states that can arise from situations that arise. Ie constraints among given tables. You didn't give them.
Having two tables with the same key goes against the idea of removing redundancy in normalization.
Excluding that, are these tables in 1NF and 2NF?
Judging by the Names field, I'd suggest that table1 is not. If multiple names can belong to one ticket, then you need a new table, most likely with a composite key of ticket_id,name.

Would a new table really be needed?

I'm making a sql database for a small company.. Pretty much the other tables don't relate to the question so ill list the two that do...
There is a table:
NextofKin:
fname
lname
street
no
houseno
city
AND
Patient:
ID[pk]
fname
lname
houseno
city
Pretty much would I need a seperate table for street, house and city?
also any idea what i could use as a primary key for NextOfKin?
Your questions are starting to get into database normalization.
What you should be doing is never duplicating data between tables unless that data relates the tables, and that data should be indexed. Something like this comes to mind ( there are different ways you might construct it based on business logic )
PersonalData: id, fname, lname, address1, address2, city, state, zip
Patient: id PK, personal_data_id FK, next_of_kin_id FK
Granted most of the tables already exist so this may be impossible. But to answer your question directly, since the database is not normalized already, there's no good place to put further address records ( don't want them under Patient right? ) and so you're stuck duplicating the data. Even so, there has to be some relationship between Patient and NextOfKin, so either Patient holds a reference to NextOfKin, or NextOfKin hods reference to Patient. Either way, you might consider using a foreign key between them to enforce, and explicitly state, this relationship.
Yes, use a pk for next of kin.
Use a joining table between patient and next of kin. Multiple patients could list the same person as next of kin, and while your app may not today require someone to designate multiple people as next of kin, they may change their mind in the future and your application will support it.
Myself, I always use a separate address table. Since usually more than one person lives in a house, and a person can have more than one home, you would again use a joining table.