normalization and normal forms: database - mysql

Basically sorry for asking such question.But I got it wrong when I wrote these definitions in my exam about 1,2,and 3rd normal form (Conditions):
1 NF :
Data in each column should be atomic.No, multiple values separated by commas
Table should not contain repeating column groups
Identify each record using primary key.
2 NF :
must be in 1 NF
must not contain redundant data, if yes, move it to separate table
create table using foreign keys
3 NF :
Must be in 2NF
Dose not contain column that are not fully depended upon primary key
Have I written something wrong?My teacher does not agree.
Source this Video.

1NF
A row of data cannot contain repeating group of data i.e each column must have a unique value. Each row of data must have a unique identifier.
2NF
A table to be normalized to Second Normal Form should meet all the needs of First Normal Form and there must not be any partial dependency of any column on primary key. It means that for a table that has concatenated primary key, each column in the table that is not part of the primary key must depend upon the entire concatenated key for its existence. If any column depends only on one part of the concatenated key, then the table fails Second normal form
3NF
Third Normal form applies that every non-prime attribute of table must be dependent on primary key. The transitive functional dependency should be removed from the table. The table must be in Second Normal form.
More references:
http://www.studytonight.com/dbms/database-normalization.php
http://holowczak.com/database-normalization

I believe the answer is wrong. You are not using terms that are associated with normalization when you should. An example of this can be found in your answer for 2NF
must not contain reduntant data, if yet, move it to seperate table
create table using foreign keys
When is data redundant? Which data do you move to a seperate table? Is creating a table always a step you take to get a table in 2NF?
If you would have said:
All attributes which are not part of the primary identifier should be completely dependent on the entire primary identifier.
You are still saying the exact same thing, no redundant data is allowed, but the way you say it shows that you know what normalization is all about.

According to your answers in exam:
1 NF :
a.Data in each column should be atomic.No, multiple values separated by commas
(TRUE because 1NF does not support Composite and Multivalued attributes and more importantly, this property is handeled during ER Model to relational model conversion by default.) Only this property is enough for 1NF.
b.Table should not contain repeating column groups.
(Not required)
c.Identify each record using primary key.
(Not required)
2 NF :
a.must be in 1 NF
(TRUE)
b.must not contain redundant data, if yes, move it to separate table.
(TRUE but here we only focus on Partial dependency. Removal of Partial Dependency is enough for 2NF.And after its removal if some redundant data is still exist,Its OK for 2NF.)
c.create table using foreign keys
(FALSE, Break the table into 2-parts in such a way where common attribute between them behaves as a Candidate Key for any of decomposed table )
Example: R(A,B,C,D) , lets suppose we want to break this table for 2NF, so decomposition is done in such a way like, (AB) and (BCD) where common attribute(HERE: 'B') behaves as a Candidate key for any of (AB) or (BCD) ).
3 NF :
a.Must be in 2NF
(Not neccessarily true, even it is not in 2NF you can go with 3NF.When it will be in 3NF ,it automatically satisfy 2NF Properties)
b.Dose not contain column that are not fully depended upon primary key
(Way of writing is wrong, You should write "In 3NF, Transitive Dependency(Non prime attribute derives prime attribute) is not allowed")
*Remember: Always keep this thing in mind that, following 1NF to 2NF, 2NF to 3NF, 3NF to BCNF is not a rule ,its a convention. Means you can directly go for BCNF(0% redundancy).
Hope this helps. For more detail, you can also refer : Detailed explanation of Normal forms

Related

Can I create a composite key with an extra character?

I'm building a new DB using MySQL to store lessons learned across a variety of projects. When we talk about this in the office, we refer to lessons by the Project Number and Lesson Number, i.e. PR12-81, where PR12 refers to the project and 81 refers to the specific lesson within that project. I want the primary key in my DB to have a hyphen in it as well.
When defining a composite key in SQL, I can make it reference the project and lesson but without the hyphen, i.e. PR1281. I've also considered creating a separate column of data type CHAR(1), putting a hyphen in every row and delcaring that the PK is made of 3 columns.
Is there another way that I can specify the primary key to be formatted in the preferred way?
Let your table's primary key be a nonsensical auto-increment number with no "meaning" whatsoever. Then, within that table, define two columns: project_number and lesson_number. If the two need to be unique, define a UNIQUE index encompassing the two fields.
Don't(!) create database keys which embed information into them, even if the business does so. If the business needs to refer to strings like PR12, so be it ... create a column to store the appropriate value, or use a one-to-many table. Use indexes as needed to enforce uniqueness.
Notice(!) that I've now described four columns:
The auto-increment based "actual" primary key, which contains no information.
The project_number column, probably a foreign key to a projects table.
Ditto the lesson_number. (With a UNIQUE composite index if needed.)
The column (or table) which contains "the string that the business uses."
Over time, business practices do change. And someday you just might .. no, you will... ... encounter a "business-used string" that was incorrectly assigned by the human-beings who do such things! Your database design needs to gracefully handle this. The schema I've described is so-called third-normal form. Do a Google-search on "normal forms" if you haven't already.

Functional dependency in another table

Lets say there are warehouses each storing items of a specific type.
So there are tables with fields
Warehouse - ID,Name,Type
Item - ID,Name,Type
WarehouseItem - Warehouse, Item
Type - ID, Name
The question is - given that a Warehouse only holds Items with of specific Type, what database normalization rule is this breaking?
Is this database normalized?
(The problem's example is made up, but I basically have this problem in real life.)
I'm making some assumptions from just looking at your metadata without any data examples, but on first glance it appears that your schema for the most part is normalized. Technically speaking your table is 3NF (which should be your target) if it meets all of the following standards:
It is also 1NF - Each entry only contains atomic data (or a single piece of info)
It is also 2NF - No candidate key dependency meaning that when you have have a composite primary key (a key made up of more than one column) that all data is dependent on the entire key
It is 3NF - No transitive dependency meaning all data is only dependent on the primary key and not some other column in the table
Note that there are also higher normalized forms but they are mostly academic as you begin experiencing performance degradation the more you normalize
Given this definition:
Warehouse appears 3NF assuming that each warehouse can only have one Type. If not then you would be failing the transitive dependency and would need to move Type information to a new table.
Item too appears 3NF assuming only one Type can be assigned
Type appears to contain redundant data and should be removed unless of course you have a many-to-many relationship between Type and Warehouse and/or Item. In that case, you would want to introduce a bridge-entity (aka composite entry) between Type and Warehouse or Item to create two 1-to-many relationships.
Lastly, if I'm reading this correctly, WarehouseItem appears to be a bridge-entity between Warehouse and Item to break up the many-to-many relationship between them. If this is correct, you should be able to argue that this table is 3NF assuming the combination of Warehouse and Item represent a composite key.
So assuming I interpreted your schema correctly, once you eliminate the redundant Type table, then yes I would say this setup technically meets 3NF. Note that your requirement that
given that a Warehouse only holds Items with of specific Type
may require you introduce a new type field which will mean you need to reevaluate your normalization of that table. If you have two distinct types (a WarehouseType and an ItemType) then you may need to keep that Type table after all and turn it into a mapping table between those two new fields. But I'd need to see data examples to better evaluate.

What is 1NF truly?

While studying relational databases, I ran into this confusing page where the following snapshot summarizes my confusion:
Why does the author say that Subject Table is in 1NF when student_id 401 and subject Math is repeated in the same way the blue depicts? This seems like a contradiction.
Chris Date gives a correct and concise definition of 1NF in his book An Introduction to Database Systems, 7th ed, p 357.
A relvar is in 1NF if and only if, in every legal value of that
relvar, every tuple contains exactly one value for each attribute.
"One value" appearing in more than one tuple (more than one row) doesn't violate 1NF. In the "Student table", each tuple (each row) contains exactly one value for each attribute. As far as we can tell from the sample data, it's in 1NF.
In the relational model, that "one value" can be arbitrarily complex--audio recordings, video, engineering drawings, etc. (Ibid, p 114)
The key concept this page fails to mention is that the data contained in the row must not be duplicated, meaning that a single key value cannot have multiple values for a single column. The real problem is that for student id of 401, the first table specifies the name twice.
The text following the last table should read "In Subject table concatenation of subject_id and student_id is the primary key". The reason the new subject table is okay is because the key is actually both of these values, so while 401 is repeated, the key is 10, 401 and 11, 401, which are two distinct values. Likewise, math is repeated, but it is the data for two separate keys; it isn't repeated within a row. Because these key values are different, it is okay.
The page you referenced doesn't give a very precise definition. I hope this explanation helps. Keep checking other sites to get a more clear understanding. Wikipedia has a good example and a precise definition, though it is a bit abstract and hard to follow.
http://en.wikipedia.org/wiki/First_normal_form

Polymorphic database design : does this approach have a name?

I have a base enitiy (items) that will host a vast range of item types (>200) with totaly different properties. I want a clean portable and fast solution and have come up with an idea that maby has a name I'm unaware of.
Here it goes:
items-entity holds base class fields + additional fields for subclass fields but with dummie-names, ItemID,ItemNo,ItemTypeID,int1,int2,dec1,dec2,dec3,str1,str2
referenced itemtype-record holds name of type and child enity (1:n):
itemtypefields [itemtypeid,name,type,realfield]
example in [53,MaxPressure,dec,dec3]
It's limitations:
hard to estimate field requirements in baseclass
harder to add domains/checkconstraints based on child type
need application layer to translate tagged sql to real query
Only possible to query one type at a time since shared attributes may be defined to different "real-fields".
3rd bullet explained:
select ItemNo,_MaxPressure_ from items where ItemTypeID=10 and _MaxPressure_>42
should translate to:
select ItemNo,dec3 as MaxPressure from items where ItemType=10 and dec3>42
(can't do that with sp's or udf's right - or whould it be possible?)
But benefits of:
Performance
Ease of CRUD-operations
Easier to sort/filter at application level.
Now - does it have a name?
This antipattern is called One True Lookup Table.
In a relational database, each column needs to be defined as one logical type. I don't mean one SQL data type like INT or VARCHAR, I mean everything in that column from start to finish must be from the same set of values, and you should be able to tell one value apart from another value.
You can't put shoe size and average temperature and threads per inch into the same column of a given table, and still call it a relation.
Basically, your database would not be a database at all -- it would be a spreadsheet.
Read almost any book by C. J. Date, such as SQL and Relational Theory for a proper explanation of relations and types.
Re your comment:
Read the Q again before lecuturing about elementary books and mocking about semi structured data.
Okay, I have re-read your post.
The classic use of One True Lookup Table isn't exactly what you're doing, but what you're doing shares the same problems with OTLT.
Suppose you have "MaxPressure" stored in column dec3 for ItemType 10. Suppose there are a fixed set of valid choices for the value of MaxPressure, and you want to put those in another lookup table, so that no one can enter an invalid MaxPressure value.
Now: declare a foreign key constraint on dec3 referencing your MaxPressures lookup table. You can't -- the problem is that the foreign key constraint applies to the dec3 column in all rows, not just those rows where ItemType is 10.
The reason is that you're storing more than one set of values in a single column. The same problem arises for any other kind of constraint -- unique constraints, check constraints, even NOT NULL. And you can't declare a DEFAULT value for the column either, because you probably have a different correct default for each ItemType (and some ItemTypes have no default for that attribute).
The reason that I referred to the C. J. Date book is that he gives a crisp definition for a type: it's a named finite set, over which the equality operation is defined. That is, you can tell if the value "42" on one row is the same as the value "42" on another row. In a relational column, that must be true because they must come from the same original set of values. In your table, dec3 could have the value "42" when it's MaxPressure, but "42" for another ItemType when it's threads per inch. Therefore they aren't the same value "42". If you had a unique constraint, these two 42's would not be considered duplicates. If you had a foreign key, each of the different 42's would reference a different lookup table, etc.
What you're doing is not a valid relational database design.
Don't bristle at my referring you to a resource on relational database design unless you understand that.

unable to enforce referential integrity in Access

I've checked everything for errors: primary key, uniqueness, and type. Access just doesnt seem to be able to link the 2 fields i have in my database. can someone please take a look?
http://www.jpegtown.com/pictures/jf5WKxKRqehz.jpg
Thanks.
Your relationship diagram shows that you've made the ID fields your primary key in all your tables, but you're not using them for your joins. Thus, they serve absolutely no purpose. If you're not going to use "surrogate keys" (i.e., a meaningless ID number that is generated by the database and is unique to each record, but has absolutely no meaning in regard to the data in your table), then eliminate them. But if you're going to use "natural keys" (i.e., a primary key constructed from a set of real data fields that together are going to be unique for each record), you must have a unique compound index on those fields.
However, there are issues with both approaches:
Surrogate Keys: a surrogate PK makes each record unique. That is you could have a record for David Fenton with ID 1 and a record for David Fenton with ID 2. If it's the same David Fenton, you've got duplicate data, but as far as your database knows, they are unique.
Natural Keys: some types of entities work very well with natural keys. The best such are where there's a single field that identifies the record uniquely. An example would be "employee type," where values might be "associate, manager, etc." In that case, it's a very good candidate for using the natural key instead of adding a surrogate key. The only argument against the natural key in that case is if the data in the candidate natural key is highly volatile (i.e., it changes frequently). While every modern database engine provides "CASCADE UPDATE" functionality (i.e., if the value in the PK field changes, all the tables where that field is a Foreign Key are automatically updated), this imposes a certain amount of overhead and can be problematic. For single-column keys, it's unlikely to be an issue. Now, except for lookup tables, there are very few entities for which a natural key will be a single column. Instead, you have to create a compound index, i.e., an index that spans multiple data fields. In the index dialog in Access table design, you create a compound key by giving it a name in the first column, and then adding multiple rows in the second column (from the dropdown list of fields in your table). The drawback of this is that if any of the fields in your compound unique index are unknown, you won't get uniqueness. That is, if a field has a Null in two records, and the rest of the fields are identical, this won't be counted as a conflict of uniqueness because Null never equals Null. This is because Null doesn't mean "empty" -- it means "Unknown."
Allen Browne has explained everything you need to know about Nulls:
Nulls: Do I Need Them?
Common Errors with Null
In your graphic, you show that you are trying to link the Company table with the PManager table. The latter table has a CompanyID field, and your Company table has a unique index on its ID field, so all you need is a link from the ID field of the Company table to the CompanyID field of the PManager table. For your example to work (which would be useless, since you already have a unique index on the ID field), you'd need to create a unique compound key spanning both ID and ShortName in the Company table.
Additionally, if ShortName is a field that you want to be unique (i.e., you don't want two company records to have the same ShortName), you should add a unique index to it, whether or not you still use the ID field as your primary key. This brings me back to item #1 above, where I described a situation where a surrogate key could lead you to enter duplicate records, because uniqueness is established by the surrogate key along. Any time you choose to use a surrogate key, you must also add a unique compound index on any combination of data fields that needs to be unique (with the caveat about Null fields as outlined in item #2).
If you're thinking "surrogate keys mean more indexes" you're correct, in that you have two unique indexes on the same table (assuming you don't have the Null problem). But you do get substantial ease of use in joining tables in SQL, as well as substantially less duplication of data. Likewise, you avoid the overhead of CASCADE UPDATE. On the other hand, if you're viewing a child table with a natural foreign key, you don't need to join to the parent table to be able to identify the parent record, because the data that identifies that record is right there in the foreign-key fields. That lack of a need for a join can be a major performance gain in certain scenarios (especially for the case where you'd need an outer join because the foreign key can be Null).
This is actually quite a huge topic, and it's something of a religious argument. I'm firmly in the surrogate key camp, but I use natural keys for lookup tables where the key is a single column. I don't use natural keys for any other purpose. That said, where possible (i.e., no Null problems) I also have a unique index on the natural key.
Hope this helps.
Actually you need an index on the name fields, on both sides
However, may I suggest that you have way too many joins? In general there should only be one join from one table to the next. It is rare to have more than one join between tables, and exceedingly rare to have more than two.
Have a look at this link:
http://weblogs.asp.net/scottgu/archive/2006/07/12/Tip_2F00_Trick_3A00_-Online-Database-Schema-Samples-Library.aspx
Notice how all of the tables are joined together by a single relationship?
Each of the fields labeled PK are primary keys. These are AUTONUMBER fields. Each of the fields labeled FK are foreign keys. These are indexed Number fields of type Integer. The Primary Keys are connected to the Foreign Keys in a 1 to many relationship (in most cases).
99% of the time, you won't need any other kind of joins. The trick is to create tables with unique information. There is a lot of repeated information in your database.
A database that is reorganized in this manner is called a "normalized" database. There are lots of good examples of these at http://www.databaseanswers.org/data_models/
Just join on the CompanyID. You could also get rid of the Company field in PManager.
I did the following and the problem was solved (I face the same problem of referential integrity in access).
I exported data from both tables in Access to Excel. Table1
was containing Cust Code and basic information about the company.
Cust Code as Primary key.
Table2 was containing all information about who the
customers associated with that company.
I removed all duplicates from Table2 exported to excel.
Using Vlookup I checked and found that there are 11
customers code not present in Table1.
I added those codes in Access Table. I linked by
referential integrity and Problem was solved.
Also look for foreign key if it does not work.
You need to create an INDEX. Perhaps look for some kind of create index button and create an index on CompanyID