I was told to create a ERD diagram given the following
The college keeps track of each student’s name, student number, social security
number, address, phone, date-of-birth, and gender.
– Each programme is described by a programme code, name, description, duration
(number of years), level, and the cost.
– Each module has a module code, name, description, duration (number of weeks),
level (introductory, intermediate, advance).
– Grade stores student number, module code, and a letter grade (A, B, C, D, E, F).
Each program enrolls students. Students then register modules. At the end of the
study duration of a module students receive their grades.
I have made attributes for each entity
Student
(sNmber, SSN, sName, address, phone, DOB, gender)
Programme
(pCode, name, description, duration, level, cost)
Module
(mCode, name, description, duration, level)
Grade
(sNumber, mCode, grade)
My final diagram looks like this with entity relationships as well, I know I will have to break the M:N relationships down to two 1:M ex.
Contains - programme_ Modules (pCode, mCode)
But my diagram seems off with the connecting Modules and Students to grade??
I am very new to this so would really appreciate some pointers
The first thing is, that "schema" is not definitive enough to be called a schema. Although the Keys may be obvious to you, they are not identified. At best, it is a description, and an incomplete one. Eg. if it were a schema, the M::N issue would be resolved, the table would be named, and all Keys would be identified.
In the ERD, you have not shown the attributes, which is required. Next, you need to show which attributes are Keys.
ERD
Here is an ERD, using somewhat improved symbols, erected from your description.
Note the Relational Keys:
They are composites
They are required to:
prevent duplicate rows
enforce Relational Integrity, which is logical (as distinct from Referential Integrity, which is physical)
Such as, we want to constrain a Student to register for a Module that is in the Programme that he is enrolled in.
Relational Data Model
Obviously, the ERD is very limited, it does not have the capability to show all the definitions that are possible, and desired, in a Relational data model. Further, as evidenced, it gets crowded very quickly.
It is quite ridiculous to teach ERD for Relational databases. In the real world, we do not use ERD for modelling data, we use IDEF1X, the Standard for Relational data modelling since 1993. This is what it looks like.
Note • Notation
All my data models are rendered in IDEF1X
My IDEF1X Introduction is essential reading for beginners
Note • Content
The Primary Keys, as well as Alternate Keys, are explicit.
The Level in Module is understood. The Level in Programme is not clear. It may or may not have some relation to the former.
The Predicates can be read directly from the model. If you need them in text form, please ask.
Related
I have to write the conceptual diagram of a DBMS.
I have a very simple, maybe banal question, but I found no answers on the internet.
This DBMS will be used by the secretaries of a school. In this schema there are entities like students, courses, exams and so on.
Can I also add the entity "secretary", even if the secretaries will be the ones who will use the DBMS?
Sure your can. You can, and in fact, you should throw everything on your conceptual design whiteboard that has any relevance to your system. And do some contemplation of where the chips might fall.
Usually you would have a "User" in your system, and that "User" might be a "Secretary" but very quickly you have other "Users", which would include the "Students" and possibly "Managers" and "Advisors".
The whole point about a word as general as "Entity" in an Entity-Relationship Model is that it is general and anything whatsoever of interest to your problem can be an "Entity" in that sense.
An entity is a type of thing to store in your database system.
Each different entity type has an identifying key, and a distinct set of attributes. For example, a user entity might have some identification number, and perhaps attributes for login name, password, creation date, email, real name, etc.
Then you have to ask if a secretary is just one of the users, or is it a distinct entity?
The answer is driven by whether a secretary has unique attributes. Is there some fact about each secretary that needs to be stored in the database, that no other user has?
If not, then perhaps secretary is just an example of a user entity. Maybe it would be helpful to make the user entity have an attribute column to note the type of user (secretary, administrator, faculty, parent, etc.), but not create a separate entity unless the category of secretary needs their own attributes.
Trying to implement an ER model where I have entities:teacher,student,papers and relationship: publishes,advises. Both teacher and student can publish a paper but only a teacher can advise a paper. Should I duplicate publishes relationship for both student and teacher or can I make it look like a three-way relationship with having no relationship between teacher and student?
It sounds like you could model it like:
student(student_id, name, etc)
teacher(teacher_id, name, etc)
paper(paper_id, title, text, etc)
contributor(contributor_id, paper_id, contribution_type, contributor_type)
Where contribution type is an enum(publisher,adviser) and similarly contributor type is an enum(teacher,student)... or booleans is_publisher, is_adviser.
The drawback is that this doesn't permit foreign keys from contributor to student/teacher, and you don't have a rigid constraint from advisers to teachers. A table adviser(teacher_id, paper_id) allows a constraint on the advisers, but still doesn't allow constraints or foreign keys on student ids.
Another options might be to break it up as:
teacher_contribution(teacher_id, paper_id, is_adviser)
student_contribution(student_id, paper_id)
which would allow completely constraining the database to the intended model, but could be more difficult to query in some situations.
Any are acceptable. It depends to some extent on your particular application and how you intend to query the data.
I would like to know if the relational table in BCNF
Student(StudentNum, NRIC, DateOfBirth, BookTitle)
• Student’s number (StudentNum) uniquely identifies the National Registration Identity Card (NRIC) and the date of birth of the student (DateOfBirth).
• The NRIC determines the student’s date of birth (DateOfBirth).
According to my analysis, the relation is in 2NF. And after changing to BCNF it looks like this
Student(StudentNum, NRIC, BookTitle)
StudentDetails(NRIC, DateOfBirth)
My query;
Before the change 2NF
After the change BCNF
Am i correct.
The idea behind normalization is to eliminate redundancy of rows as much as possible by putting creating additional tables for those columns that may seem to inflict any redundancy on the table.
For example if we start with the following table: Student(StudentNum, NRIC, DateOfBirth, BookTitle) The unique columns in this case are the StudentNum and NRIC, the other 2 fields are not because other students may have the same date of birth and others may have borrowed the book of the same title. From there we see the need to normalize in order for us not to fall into the redundancy of data, for example what if the same student borrowed 100 different books?
If everything is in a single table, we may end up with lots of redundant(repetitive) data.
I suggest you check out this guide to the 5 normal forms http://www.bkent.net/Doc/simple5.htm
Update:
I think the beginning relation better to be considered 1st normal form given that everything is within a single table. Your resulting relation is 2NF I guess because what if different students have borrowed the same book? That may lead to repetition in the Student table.
I think you have to give more info regarding the scenario constituting your relations so we can analyze this better. It highly depends on the business rules.
No. Student is in 1NF, not 2NF.
You're starting with
Student(StudentNum, NRIC, DateOfBirth, BookTitle)
and these dependencies.
StudentNum->NRIC
StudentNum->DateOfBirth
NRIC->DateOfBirth
A relation is in 2NF if and only if
it's in 1NF, and
every non-prime attribute is dependent on the whole of every candidate key, not just on part of any candidate key.
So your first job is to determine the candidate keys of the Student relation. The Student relation has only one candidate key, and that's {StudentNum, BookTitle}.
Your textbook should have at least one algorithm for determining all the candidate keys of a relation.
Since NRIC is dependent on StudentNum, and StudentNum isn't a candidate key (it's just part of a candidate key), the relation Student is not in 2NF. Fix that by changing
Student(StudentNum, NRIC, DateOfBirth, BookTitle)
to this, by eliminating the partial key dependency on StudentNum.1
Student (StudentNum, NRIC, DateOfBirth)
StudentBooks (StudentNum, BookTitle)
StudentBooks has no non-prime attributes at all; it's now in 6NF. Student is in 2NF, but not yet in 3NF or BCNF. Do you know why?
It seems you did know why. There is indeed a transitive dependency: StudentNum->NRIC, and
NRIC->DateOfBirth. Fix the transitive dependency like this.
Student (StudentNum, NRIC)
NRIC (NRIC, DateOfBirth)
StudentBooks (StudentNum, BookTitle)
All three of those relations are in 6NF.
This decomposition might look a little odd. That's because textbook examples usually don't use meaningful names for either relations or attributes. Relations are usually named R{ABCD}, R1{ABC}, R2{AD}, etc. The decomposition above involved
projecting a new relation {StudentNum, NRIC, DateOfBirth} to eliminate the partial key dependency,
observing that the name "Student" no longer identified the relation consisting of {StudentNum, BookTitle},
moving the name "Student" from the original relation to the new, projected relation, and
coming up with a new name, StudentBooks, for the original relation.
We have a J2EE content management and e-commerce system, and in this system – for sake of a simple example – let’s say that we have 100 objects. All of these objects extend the same base class, and all share many of the same fields.
Let’s take two objects as an example: a news item that would be posted on a website, and a product that would be sold on a website. Both of these share common properties:
IDs: id, client ID, parent ID (long)
Flags: deleted, archived, inactive (boolean)
Dates: created, modified, deleted (datetime)
Content: name, description
And of course they have some properties that are different:
News item: author, posting date
Product: price, tax
So (finally) here is my question. Let’s say we have 100 objects in our system, and they all follow this pattern. They have many fields that overlap, and some unique fields. In terms of a relational database, would we be better off with:
Option One: Less Tables, Common Tables
table_id: id, client ID, parent ID (long) (id is the primary key, a GUID for all objects)
table_flag: id, deleted, archived, inactive (boolean)
table_date: id, created, modified, deleted (datetime)
table_content: id, name, description
table_news: id, author, posting date
table_product: id, price, tax
Option Two: More Tables, Common Fields Repeated
table_news: id, client ID, parent ID, deleted, archived, inactive, name, description, author, posting date
table_product: id, client ID, parent ID, deleted, archived, inactive, name, description, price, tax
For full disclosure – I am a developer and not a DBA, and because of that I prefer option one. But there is another team member that prefers option two, and I think he makes valid points.
Option One: Pros and Cons
Pro: Encapsulates common fields into common tables.
Pro: Need to change a common field? Change it in one place.
Pro: Only creates new fields/tables when they are needed.
Pro: Easier to create the queries dynamically, less repetitive code
Con: More joining to create objects (not sure of DB impact on that)
Con: More complex queries to store objects (not sure of DB impact on that)
Con: Common tables will become huge over time
Option Two: Pros and Cons
Pro: Perhaps it is better to distribute the load of all objects across tables?
Pro: Could index the news table on the client ID, and index the product table on the parent ID.
Pro: More readable to human eye: easy to see all the fields for an object in one table.
My Two Cents
For me, I much prefer the elegance of the first option – but maybe that is me trying to force object oriented patterns on a relational database. If all things were equal, I would go with option one UNLESS a DB expert told me that when we have millions of objects in the system, option one is going to create a performance problem.
Apologies for the long winded question. I am not great with DB lingo, so I probably could have summarized this more succinctly if I better understood terms like normalization. I tried to search for answers on this topic, and while I found many that were close (I suspect this is a common DB issue) I could not find any that answered all my questions. I read through this article on normalization:
But I did not totally understand it. On the one hand it was saying that you should remove any redundancies. But on the other hand, it was saying that each attribute should define only one object.
Thanks,
John
You should read Patterns of Enterprise Application Architecture by Martin Fowler. He writes about several options for the scenario you describe:
Single Table Inheritance: One table for all object subtypes. Stores all attributes, setting them NULL where they are inapplicable to the row's object subtype.
Class Table Inheritance: One table for column common to all subtypes, then one table for each subtype to store subtype-specific columns.
Concrete Table Inheritance: One table for each subtype, storing both subtype-specific columns and columns common to all subtypes.
Serialized LOB: One table for all object subtypes. Store common attributes as conventional columns, but combine optional or subtype-specific columns as fields in a BLOB that stores XML or JSON or whatever format you want.
Each one of these designs has pros and cons, so choose a solution depending on the most common way you access your data.
However, notice I use the word subtype above. I would use these designs only if the different object types are subtypes of a common base class. I'm assuming that News item and Product don't actually share a logical base class (besides Object); they are not subtypes of a common superclass.
So for the sake of OO design, I would choose Concrete Table Inheritance. This avoids any inappropriate coupling between these subtypes. There are columns the two tables have in common, but they basically amount to bookkeeping, not anything to do with the function of the class and hence the table.
What is normalization in MySQL and in which case and how we need to use it?
I try to attempt to explain normalization in layman terms here. First off, it is something that applies to relational database (Oracle, Access, MySQL) so it is not only for MySQL.
Normalisation is about making sure each table has the only minimal fields and to get rid of dependencies. Imagine you have an employee record, and each employee belongs to a department. If you store the department as a field along with the other data of the employee, you have a problem - what happens if a department is removed? You have to update all the department fields, and there's opportunity for error. And what if some employees does not have a department (newly assigned, perhaps?). Now there will be null values.
So the normalisation, in brief, is to avoid having fields that would be null, and making sure that the all the fields in the table only belong to one domain of data being described. For example, in the employee table, the fields could be id, name, social security number, but those three fields have nothing to do with the department. Only employee id describes which department the employee belongs to. So this implies that which department an employee is in should be in another table.
Here's a simple normalization process.
EMPLOYEE ( < employee_id >, name, social_security, department_name)
This is not normalized, as explained. A normalized form could look like
EMPLOYEE ( < employee_id >, name, social_security)
Here, the Employee table is only responsible for one set of data. So where do we store which department the employee belongs to? In another table
EMPLOYEE_DEPARTMENT ( < employee_id >, department_name )
This is not optimal. What if the department name changes? (it happens in the US government all the time). Hence it is better to do this
EMPLOYEE_DEPARTMENT ( < employee_id >, department_id )
DEPARTMENT ( < department_id >, department_name )
There are first normal form, second normal form and third normal form. But unless you are studying a DB course, I usually just go for the most normalized form I could understand.
Normalization is not for MYSql only. Its a general database concept.
Normalization is the process of
efficiently organizing data in a
database. There are two goals of the
normalization process: eliminating
redundant data (for example, storing
the same data in more than one table)
and ensuring data dependencies make
sense (only storing related data in a
table). Both of these are worthy goals
as they reduce the amount of space a
database consumes and ensure that data
is logically stored.
Normal forms in SQL are given below.
First Normal form (1NF): A relation is
said to be in 1NF if it has only
single valued attributes, neither
repeating nor arrays are permitted.
Second Normal Form (2NF): A relation
is said to be in 2NF if it is in 1NF
and every non key attribute is fully
functional dependent on the primary
key.
Third Normal Form (3NF): We say that a
relation is in 3NF if it is in 2NF and
has no transitive dependencies.
Boyce-Codd Normal Form (BCNF): A
relation is said to be in BCNF if and
only if every determinant in the
relation is a candidate key.
Fourth Normal Form (4NF): A relation
is said to be in 4NF if it is in BCNF
and contains no multivalued dependency.
Fifth Normal Form (5NF): A relation is
said to be in 5NF if and only if every
join dependency in relation is implied
by the candidate keys of relation.
Domain-Key Normal Form (DKNF): We say
that a relation is in DKNF if it is
free of all modification anomalies.
Insertion, Deletion, and update
anomalies come under modification
anomalies
Seel also
Database Normalization Basics
It's a technique for ensuring that your data remains consistent, by eliminating duplication. So a database in which the same information is stored in more than one table is not normalized.
See the Wikipedia article on Database normalization.
(It's a general technique for relational databases, not specific to MySQL.)
While creating a database schema for your application, you need to make sure that you avoid any information being stored in more than one column across different tables.
As every table in your DB, identifies a significant entity in your application, a unique identifier is a must-have columns for them.
Now, while deciding the storage schema, various kinds of relationships are being identified between these entities (tables), viz-a-viz, one-to-one, one-to-many, many-to-many.
For a one-to-one relationship (eg. A
Student has a unique rank in the
class), same table could be used to
store columns (from both tables).
For a one-to-many relationship (eg.
A semester can have multiple
courses), a foreign key is being
created in a parent table.
For a many-to-many relationship (eg.
A Prof. attends to many students and
vice-versa), a third table needs to
be created (with primary key from
both tables as a composite key), and
related data of the both tables will
be stored.
Once you attend to all these scenarios, your db-schema will be normalized to 4NF.
In the field of relational database
design, normalization is a systematic
way of ensuring that a database
structure is suitable for
general-purpose querying and free of
certain undesirable
characteristics—insertion, update, and
deletion anomalies—that could lead to
a loss of data integrity.[1] E.F.
Codd, the inventor of the relational
model, introduced the concept of
normalization and what we now know as
the first normal form in 1970.[2] Codd
went on to define the second and third
normal forms in 1971,[3] and Codd and
Raymond F. Boyce defined the
Boyce-Codd normal form in 1974.[4]
Higher normal forms were defined by
other theorists in subsequent years,
the most recent being the sixth normal
form introduced by Chris Date, Hugh
Darwen, and Nikos Lorentzos in
2002.[5]
Informally, a relational database
table (the computerized representation
of a relation) is often described as
"normalized" if it is in the third
normal form (3NF).[6] Most 3NF tables
are free of insertion, update, and
deletion anomalies, i.e. in most cases
3NF tables adhere to BCNF, 4NF, and
5NF (but typically not 6NF).
A standard piece of database design
guidance is that the designer should
create a fully normalized design;
selective denormalization can
subsequently be performed for
performance reasons.[7] However, some
modeling disciplines, such as the
dimensional modeling approach to data
warehouse design, explicitly recommend
non-normalized designs, i.e. designs
that in large part do not adhere to
3NF.[8]
Edit: Source: http://en.wikipedia.org/wiki/Database_normalization