What is 1NF truly? - relational-database

While studying relational databases, I ran into this confusing page where the following snapshot summarizes my confusion:
Why does the author say that Subject Table is in 1NF when student_id 401 and subject Math is repeated in the same way the blue depicts? This seems like a contradiction.

Chris Date gives a correct and concise definition of 1NF in his book An Introduction to Database Systems, 7th ed, p 357.
A relvar is in 1NF if and only if, in every legal value of that
relvar, every tuple contains exactly one value for each attribute.
"One value" appearing in more than one tuple (more than one row) doesn't violate 1NF. In the "Student table", each tuple (each row) contains exactly one value for each attribute. As far as we can tell from the sample data, it's in 1NF.
In the relational model, that "one value" can be arbitrarily complex--audio recordings, video, engineering drawings, etc. (Ibid, p 114)

The key concept this page fails to mention is that the data contained in the row must not be duplicated, meaning that a single key value cannot have multiple values for a single column. The real problem is that for student id of 401, the first table specifies the name twice.
The text following the last table should read "In Subject table concatenation of subject_id and student_id is the primary key". The reason the new subject table is okay is because the key is actually both of these values, so while 401 is repeated, the key is 10, 401 and 11, 401, which are two distinct values. Likewise, math is repeated, but it is the data for two separate keys; it isn't repeated within a row. Because these key values are different, it is okay.
The page you referenced doesn't give a very precise definition. I hope this explanation helps. Keep checking other sites to get a more clear understanding. Wikipedia has a good example and a precise definition, though it is a bit abstract and hard to follow.
http://en.wikipedia.org/wiki/First_normal_form

Related

Normalize two tables with same primary key to 3NF

I have two tables currently with the same primary key, can I have these two tables with the same primary key?
Also are all the tables in 3rd normal form
Ticket:
-------------------
Ticket_id* PK
Flight_name* FK
Names*
Price
Tax
Number_bags
Travel class:
-------------------
Ticket id * PK
Customer_5star
Customer_normal
Customer_2star
Airmiles
Lounge_discount
ticket_economy
ticket_business
ticket_first
food allowance
drink allowance
the rest of the tables in the database are below
Passengers:
Names* PK
Credit_card_number
Credit_card_issue
Ticket_id *
Address
Flight:
Flight_name* PK
Flight_date
Source_airport_id* FK
Dest_airport_id* FK
Source
Destination
Plane_id*
Airport:
Source_airport_id* PK
Dest_airport_id* PK
Source_airport_country
Dest_airport_country
Pilot:
Pilot_name* PK
Plane id* FK
Pilot_grade
Month
Hours flown
Rate
Plane:
Plane_id* PK
Pilot_name* FK
This is not meant as an answer but it became too long for a comment...
Not to sound harsh, but your model has some serious flaws and you should probably take it back to the drawing board.
Consider what would happen if a Passenger buys a second Ticket for instance. The Passenger table should not hold any reference to tickets. Maybe a passenger can have more than one credit card though? Shouldn't Credit Cards be in their own table? The same applies to Addresses.
Why does the Airport table hold information that really is about destinations (or paths/trips)? You already record trip information in the Flights table. It seems to me that the Airport table should hold information pertaining to a particular airport (like name, location?, IATA code et cetera).
Can a Pilot just be associated with one single Plane? Doesn't sound very likely. The pilot table should not hold information about planes.
And the Planes table should not hold information on pilots as a plane surely can be connected to more than one pilot.
And so on... there are most likely other issues too, but these pointers should give you something to think about.
The only tables that sort of looks ok to me are Ticket and Flight.
Re same primary key:
Yes there can be multiple tables with the same primary key. Both in principle and in good practice. We declare a primary or other unique column set to say that those columns (and supersets of them) are unique in a table. When that is the case, declare such column sets. This happens all the time.
Eg: A typical reasonable case is "subtyping"/"subtables", where entities of a kind identified by a candidate key of one table are always or sometimes also of the kind identifed by the same values in another table. (If always then the one table's candidate key values are also in the other table's. And so we would declare a foreign key from the one to the other. We would say the one table's kind of entity is a subtype of the other's.) On the other hand sometimes one table is used with attributes of both kinds and attributes inapplicable to one kind are not used. (Ie via NULL or a tag indicating kind.)
Whether you should have cases of the same primary key depends on other criteria for good design as applied to your particular situation. You need to learn design including normalization.
Eg: All keys simple and 3NF implies 5NF, so if your two tables have the same set of values as only & simple primary key in every state and they are both in 3NF then their join contains exactly the same information as they do separately. Still, maybe you would keep them separate for clarity of design, for likelihood of change or for performance based on usage. You didn't give that information.
Re normal forms:
Normal forms apply to tables. The highest normal form of a table is a property independent of any other table. (Athough you might choose that form based on what forms & tables are alternatives.)
In order to normalize or determine a table's highest normal form one needs to know (in general) all the functional dependencies in it. (For normal forms above BCNF, also join dependencies.) You didn't give them. They are determined by what the meaning of the table is (ie how to determine what rows go in it in any given situation) and the possible situtations that can arise. You didn't give them. Your expectation that we could tell you about the normal forms your tables are in without giving such information suggests that you do not understand normalization and need to educate yourself about it.
Proper design also needs this information and in general all valid states that can arise from situations that arise. Ie constraints among given tables. You didn't give them.
Having two tables with the same key goes against the idea of removing redundancy in normalization.
Excluding that, are these tables in 1NF and 2NF?
Judging by the Names field, I'd suggest that table1 is not. If multiple names can belong to one ticket, then you need a new table, most likely with a composite key of ticket_id,name.

normalization and normal forms: database

Basically sorry for asking such question.But I got it wrong when I wrote these definitions in my exam about 1,2,and 3rd normal form (Conditions):
1 NF :
Data in each column should be atomic.No, multiple values separated by commas
Table should not contain repeating column groups
Identify each record using primary key.
2 NF :
must be in 1 NF
must not contain redundant data, if yes, move it to separate table
create table using foreign keys
3 NF :
Must be in 2NF
Dose not contain column that are not fully depended upon primary key
Have I written something wrong?My teacher does not agree.
Source this Video.
1NF
A row of data cannot contain repeating group of data i.e each column must have a unique value. Each row of data must have a unique identifier.
2NF
A table to be normalized to Second Normal Form should meet all the needs of First Normal Form and there must not be any partial dependency of any column on primary key. It means that for a table that has concatenated primary key, each column in the table that is not part of the primary key must depend upon the entire concatenated key for its existence. If any column depends only on one part of the concatenated key, then the table fails Second normal form
3NF
Third Normal form applies that every non-prime attribute of table must be dependent on primary key. The transitive functional dependency should be removed from the table. The table must be in Second Normal form.
More references:
http://www.studytonight.com/dbms/database-normalization.php
http://holowczak.com/database-normalization
I believe the answer is wrong. You are not using terms that are associated with normalization when you should. An example of this can be found in your answer for 2NF
must not contain reduntant data, if yet, move it to seperate table
create table using foreign keys
When is data redundant? Which data do you move to a seperate table? Is creating a table always a step you take to get a table in 2NF?
If you would have said:
All attributes which are not part of the primary identifier should be completely dependent on the entire primary identifier.
You are still saying the exact same thing, no redundant data is allowed, but the way you say it shows that you know what normalization is all about.
According to your answers in exam:
1 NF :
a.Data in each column should be atomic.No, multiple values separated by commas
(TRUE because 1NF does not support Composite and Multivalued attributes and more importantly, this property is handeled during ER Model to relational model conversion by default.) Only this property is enough for 1NF.
b.Table should not contain repeating column groups.
(Not required)
c.Identify each record using primary key.
(Not required)
2 NF :
a.must be in 1 NF
(TRUE)
b.must not contain redundant data, if yes, move it to separate table.
(TRUE but here we only focus on Partial dependency. Removal of Partial Dependency is enough for 2NF.And after its removal if some redundant data is still exist,Its OK for 2NF.)
c.create table using foreign keys
(FALSE, Break the table into 2-parts in such a way where common attribute between them behaves as a Candidate Key for any of decomposed table )
Example: R(A,B,C,D) , lets suppose we want to break this table for 2NF, so decomposition is done in such a way like, (AB) and (BCD) where common attribute(HERE: 'B') behaves as a Candidate key for any of (AB) or (BCD) ).
3 NF :
a.Must be in 2NF
(Not neccessarily true, even it is not in 2NF you can go with 3NF.When it will be in 3NF ,it automatically satisfy 2NF Properties)
b.Dose not contain column that are not fully depended upon primary key
(Way of writing is wrong, You should write "In 3NF, Transitive Dependency(Non prime attribute derives prime attribute) is not allowed")
*Remember: Always keep this thing in mind that, following 1NF to 2NF, 2NF to 3NF, 3NF to BCNF is not a rule ,its a convention. Means you can directly go for BCNF(0% redundancy).
Hope this helps. For more detail, you can also refer : Detailed explanation of Normal forms

Is it good practice to use null values as a placeholder for future data?

I'm working on a database that users enter in various data at different times.
Currently I have a many-to-many relationship between three tables.
tblDog: id, Name
tblOwner: id, Name
tblVet: id, name
tblDog_Owner_Vet: id, Dog_id, Owner_id, Vet_id
In a perfect world one would have all the information at one time that connects all three of these entities together but the user might have a bit of information now and then more later. Therefor I let them enter it has they get the information so entries may look like below. I am aware that usually a dog would have only one owner/vet but for sake of this question please consider it possible that a dog can have more than one owner and vet:
1, 1, 1, 1
2, 2, 2, null
3, 2, null, 3
They then can later go back and either add missing info or merge two rows that turn out to be associated.
Is this ok to do or are all these null values a problem? Is there another solution I may be missing?
As documented under Working with NULL Values:
Conceptually, NULL means “a missing unknown value”
Therefore, it is exactly what you want in this case.
You may however like to read up on the criticisms surrounding the use of NULL, which is an age-old debate in the database world.
I just use 0 (zero) for unset foreign key columns.
It might not make a difference in most cases but I like that it is an int type and always simpler to test against in an application.
I would have just added this as a comment but I don't have enough rep yet.
There is a huge disadvantage to using nulls as you suggest. Many to many relationships, or in your case, many to many to many use composite primary keys to uniquely identify records. As you know, primary key columns must be declared not null.
The workaround is to have records in the owner and vet tables that indicate "Not Applicable" or "none" or something like that. Also, you would need a field also part of the primary key, in your many to many table that indicates whether the record is currently true. For example, if you had a vetless dog in your table, you could assign the "not applicable" value to the vet field. Then, when doggie gets a vet, you add a new record and update this record showing that it is currently false.
Edit starts here
From the comment, "Couldn't I just over write the "none"_id in that record with the new Vet's id?". There is always more than one way to accomplish something. Updating the record is one way to do it. Another is deleting the the "none id" record and adding a new one. To help decide, ask yourself how you intend to handle the situation where owner John decides that Dr Bloggins is no longer welcome as Fido's vet.
Also, you are not getting rid of the null to protect the foreign key constraint. You are doing it because vet_id will be part of the primary key so it can't be null.

Polymorphic database design : does this approach have a name?

I have a base enitiy (items) that will host a vast range of item types (>200) with totaly different properties. I want a clean portable and fast solution and have come up with an idea that maby has a name I'm unaware of.
Here it goes:
items-entity holds base class fields + additional fields for subclass fields but with dummie-names, ItemID,ItemNo,ItemTypeID,int1,int2,dec1,dec2,dec3,str1,str2
referenced itemtype-record holds name of type and child enity (1:n):
itemtypefields [itemtypeid,name,type,realfield]
example in [53,MaxPressure,dec,dec3]
It's limitations:
hard to estimate field requirements in baseclass
harder to add domains/checkconstraints based on child type
need application layer to translate tagged sql to real query
Only possible to query one type at a time since shared attributes may be defined to different "real-fields".
3rd bullet explained:
select ItemNo,_MaxPressure_ from items where ItemTypeID=10 and _MaxPressure_>42
should translate to:
select ItemNo,dec3 as MaxPressure from items where ItemType=10 and dec3>42
(can't do that with sp's or udf's right - or whould it be possible?)
But benefits of:
Performance
Ease of CRUD-operations
Easier to sort/filter at application level.
Now - does it have a name?
This antipattern is called One True Lookup Table.
In a relational database, each column needs to be defined as one logical type. I don't mean one SQL data type like INT or VARCHAR, I mean everything in that column from start to finish must be from the same set of values, and you should be able to tell one value apart from another value.
You can't put shoe size and average temperature and threads per inch into the same column of a given table, and still call it a relation.
Basically, your database would not be a database at all -- it would be a spreadsheet.
Read almost any book by C. J. Date, such as SQL and Relational Theory for a proper explanation of relations and types.
Re your comment:
Read the Q again before lecuturing about elementary books and mocking about semi structured data.
Okay, I have re-read your post.
The classic use of One True Lookup Table isn't exactly what you're doing, but what you're doing shares the same problems with OTLT.
Suppose you have "MaxPressure" stored in column dec3 for ItemType 10. Suppose there are a fixed set of valid choices for the value of MaxPressure, and you want to put those in another lookup table, so that no one can enter an invalid MaxPressure value.
Now: declare a foreign key constraint on dec3 referencing your MaxPressures lookup table. You can't -- the problem is that the foreign key constraint applies to the dec3 column in all rows, not just those rows where ItemType is 10.
The reason is that you're storing more than one set of values in a single column. The same problem arises for any other kind of constraint -- unique constraints, check constraints, even NOT NULL. And you can't declare a DEFAULT value for the column either, because you probably have a different correct default for each ItemType (and some ItemTypes have no default for that attribute).
The reason that I referred to the C. J. Date book is that he gives a crisp definition for a type: it's a named finite set, over which the equality operation is defined. That is, you can tell if the value "42" on one row is the same as the value "42" on another row. In a relational column, that must be true because they must come from the same original set of values. In your table, dec3 could have the value "42" when it's MaxPressure, but "42" for another ItemType when it's threads per inch. Therefore they aren't the same value "42". If you had a unique constraint, these two 42's would not be considered duplicates. If you had a foreign key, each of the different 42's would reference a different lookup table, etc.
What you're doing is not a valid relational database design.
Don't bristle at my referring you to a resource on relational database design unless you understand that.

First Normal Form and First and Last Names

I'm trying to grasp 1NF and am wondering is this following table is 1NF or not. I'm going to assume no because in the colums first_name, last_name, and full_name can be repeated and thus need to be shifted to a new table where there is the columns user_id and first_name, last_name, and full_name. Picture below is a screenshot of the database in reference.
http://imgur.com/kerlB
The 1NF is about atomicity, not redundancy (that's what higher normal forms are about). Essentially, if all attributes are atomic, your table is in 1NF.
Obviously, whether a table is in 1NF depends on what you define as "atomic". What "atomicity" actually means is a matter of some controversy, but I'd take a pragmatic case-by-case approach here and simply ask:
In the context of the problem I'm trying to solve, does it ever make a sense to access1 any part of the value, or I always access the whole value?
If I always access the whole, it is atomic in that particular context.
In your example, it is likely you'll want to access first_name and last_name separately, so full_name would be non-atomic and that would be the reason for violating the 1NF. If, however, you know you'll never need to access the first and last name separately, then you could have just the full_name and still not violate the 1NF.
1 "Accessing" the value should be understood fairly widely here. It might obviously mean reading it from the database, but could also mean using it in a constraint, or indexing it etc...
It certainly could be.
What you are saying with your current design, if it is 1st normal is that a single "entity" (lets just call it a person) is associated with one and only one user record.
If you move the name fields into a separate table, what you are basically saying is that a single "person" could be associated with one or more user records, and that the person, when updating their name, should make the same change to all of their "users"
If you needed a structure like this, the table would look more like:
user_id|username|password|email|person_id
and you would have a separate table for each "person"
person_id|first_name|last_name|full_name
1st normal is not about duplicating data. Just looking at the first name, you very well might have many people with the name "Bob" or "Alice" Just because that data is duplicated over and over again, is not the same as saying that the table has duplicate data. The point is that each record should be atomic.