mysql unique constraint fails with null values? [duplicate] - mysql
This question requires some hypothetical background. Let's consider an employee table that has columns name, date_of_birth, title, salary, using MySQL as the RDBMS. Since if any given person has the same name and birth date as another person, they are, by definition, the same person (barring amazing coincidences where we have two people named Abraham Lincoln born on February 12, 1809), we'll put a unique key on name and date_of_birth that means "don't store the same person twice." Now consider this data:
id name date_of_birth title salary
1 John Smith 1960-10-02 President 500,000
2 Jane Doe 1982-05-05 Accountant 80,000
3 Jim Johnson NULL Office Manager 40,000
4 Tim Smith 1899-04-11 Janitor 95,000
If I now try to run the following statement, it should and will fail:
INSERT INTO employee (name, date_of_birth, title, salary)
VALUES ('Tim Smith', '1899-04-11', 'Janitor', '95,000')
If I try this one, it will succeed:
INSERT INTO employee (name, title, salary)
VALUES ('Jim Johnson', 'Office Manager', '40,000')
And now my data will look like this:
id name date_of_birth title salary
1 John Smith 1960-10-02 President 500,000
2 Jane Doe 1982-05-05 Accountant 80,000
3 Jim Johnson NULL Office Manager 40,000
4 Tim Smith 1899-04-11 Janitor 95,000
5 Jim Johnson NULL Office Manager 40,000
This is not what I want but I can't say I entirely disagree with what happened. If we talk in terms of mathematical sets,
{'Tim Smith', '1899-04-11'} = {'Tim Smith', '1899-04-11'} <-- TRUE
{'Tim Smith', '1899-04-11'} = {'Jane Doe', '1982-05-05'} <-- FALSE
{'Tim Smith', '1899-04-11'} = {'Jim Johnson', NULL} <-- UNKNOWN
{'Jim Johnson', NULL} = {'Jim Johnson', NULL} <-- UNKNOWN
My guess is that MySQL says, "Since I don't know that Jim Johnson with a NULL birth date isn't already in this table, I'll add him."
My question is: How can I prevent duplicates even though date_of_birth is not always known? The best I've come up with so far is to move date_of_birth to a different table. The problem with that, however, is that I might end up with, say, two cashiers with the same name, title and salary, different birth dates and no way to store them both without having duplicates.
A fundamental property of a unique key is that
it must be unique. Making part of that key Nullable destroys this property.
There are two possible solutions to your problem:
One way, the wrong way, would be to use some magic date to represent unknown. This just gets you past
the DBMS "problem" but does not solve the problem in a logical sense.
Expect problems with two "John Smith" entries having unknown dates
of birth. Are these guys one and the same or are they unique individuals?
If you know they are different then you are back to the same old problem -
your Unique Key just isn't unique. Don't even think about assigning a whole range of magic dates
to represent "unknown" - this is truly the road to hell.
A better way is to create an EmployeeId attribute as a surrogate key. This is just an
arbitrary identifier that you assign to individuals that you know are unique. This
identifier is often just an integer value.
Then create an Employee table to relate the EmployeeId (unique, non-nullable
key) to what you believe are the dependant attributers, in this case
Name and Date of Birth (any of which may be nullable). Use the EmployeeId surrogate key everywhere that you
previously used the Name/Date-of-Birth. This adds a new table to your system but
solves the problem of unknown values in a robust manner.
I think MySQL does it right here. Some other databases (for example Microsoft SQL Server) treat NULL as a value that can only be inserted once into a UNIQUE column, but personally I find this to be strange and unexpected behaviour.
However since this is what you want, you can use some "magic" value instead of NULL, such as a date a long time in the past
I recommend to create additional table column checksum which will contain md5 hash of name and date_of_birth. Drop unique key (name, date_of_birth) because it doesn't solve the problem. Create one unique key on checksum.
ALTER TABLE employee
ADD COLUMN checksum CHAR(32) NOT NULL;
UPDATE employee
SET checksum = MD5(CONCAT(name, IFNULL(date_of_birth, '')));
ALTER TABLE employee
ADD UNIQUE (checksum);
This solution creates small technical overhead, cause for every inserted pairs you need to generate hash (same thing for every search query). For further improvements you can add trigger that will generate hash for you in every insert:
CREATE TRIGGER before_insert_employee
BEFORE INSERT ON employee
FOR EACH ROW
IF new.checksum IS NULL THEN
SET new.checksum = MD5(CONCAT(new.name, IFNULL(new.date_of_birth, '')));
END IF;
Your problem of not having duplicates based on name is not solvable because you do not have a natural key. Putting a fake date in for people whose date of birth is unknown will not solve your problem. John Smith born 1900/01/01 is still going to be a differnt person than John Smithh born 1960/03/09.
I work with name data from large and small organizations every day and I can assure you they have two different people with the same name all the time. Sometimes with the same job title. Birthdate is no guarantee of uniqueness either, plenty of John Smiths born on the same date. Heck when we work with physicians office data we have often have two doctors with the same name, address and phone number (father and son combinations)
Your best bet is to have an employee ID if you are inserting employee data to identify each employee uniquely. Then check for the uniquename in the user interface and if there are one or more matches, ask the user if he meant them and if he says no, insert the record. Then build a deupping process to fix problems if someone gets assigned two ids by accident.
There is a another way to do it. Adding a column(non-nullable) to represent the String value of date_of_birth column. The new column value would be ""(empty string) if date_of_birth is null.
We name the column as date_of_birth_str and create a unique constraint employee(name, date_of_birth_str). So when two recoreds come with the same name and null date_of_birth value, the unique constraint still works.
But the efforts of maintenance for the two same-meaning columns, and, the performance harm of new column, should be considered carefully.
You can add a generated column where the NULL value is replaced by an unused constant, e.g. zero. Then you can apply the unique constraint to this column:
CREATE TABLE employee (
name VARCHAR(50) NOT NULL,
date_of_birth DATE,
uq_date_of_birth DATE AS (IFNULL(date_of_birth, '0000-00-00')) UNIQUE
);
The perfect solution would be support for function based UK's, but that becomes more complex as mySQL would also then need to support function based indexes. This would prevent the need to use "fake" values in place of NULL, while also allowing developers the ability to decide how to treat NULL values in UK's. Unfortunately, mySQL doesn't currently support such functionality that I am aware of, so we're left with workarounds.
CREATE TABLE employee(
name CHAR(50) NOT NULL,
date_of_birth DATE,
title CHAR(50),
UNIQUE KEY idx_name_dob (name, IFNULL(date_of_birth,'0000-00-00 00:00:00'))
);
(Note the use of the IFNULL() function in the unique key definition)
I had a similar problem to this, but with a twist. In your case, every employee has a birthday, although it may be unknown. In that case, it makes logical sense for the system to assign two values for employees with unknown birthdays but otherwise identical information. NealB's accepted answer is very accurate.
However, the problem I encountered was one in which the data field did not necessarily have a value. For example, if you added a 'name_of_spouse' field to your table, there wouldn't necessarily be a value for each row of the table. In that case, NealB's first bullet point (the 'wrong way') actually makes sense. In this case, a string 'None' should be inserted in the column name_of_spouse for each row in which there was no known spouse.
The situation where I ran into this problem was in writing a program with database to classify IP traffic. The goal was to create a graph of IP traffic on a private network. Each packet was put into a database table with a unique connection index based on its ip source and dest, port source and dest, transport protocol, and application protocol. However, many packets simply don't have an application protocol. For example, all TCP packets without an application protocol should be classed together, and should occupy one unique entry in the connections index. This is because I want those packets to form a single edge of my graph. In this situation, I took my own advice from above, and stored a string 'None' in the application protocol field to ensure that these packets formed a unique group.
I were looking for one solution and the Alexander Yancharuk suggested was good idea for me. But in my case my columns are foreign keys and employee_id can be null.
I have this structure:
+----+---------+-------------+
| id | room_id | employee_id |
+----+---------+-------------+
| 1 | 1 | NULL |
| 2 | 2 | 1 |
+----+---------+-------------+
And the room_id with employee_id NULL can not be duplicated
I solved adding a trigger before insert, like this:
DELIMITER $$
USE `db`$$
CREATE DEFINER=`root`#`%` TRIGGER `db`.`room_employee` BEFORE INSERT ON `room_employee` FOR EACH ROW
BEGIN
IF EXISTS (
SELECT room_id, employee_id
FROM room_employee
WHERE (NEW.room_id = room_employee.room_id AND NEW.employee_id IS NULL AND room_employee.employee_id IS NULL)
) THEN
CALL `The room Can not be duplicated on room employee table`;
END IF;
END$$
DELIMITER ;
I also added a constraint unique for room_id and employee_id
I think the fundamental question here is what you actually mean with
INSERT INTO employee (name, title, salary) VALUES ('Jim Johnson', 'Office Manager', '40,000')
Your own definition of a person is name AND birth date, so what does this statement mean in that context? I'd say that the solution to your problem is to prohibit inserting half identities, like the one above, by adding NOT NULL on both your name and date_of_birth columns. That way, the statement will fail and force you to enter complete identities and the unique key will do its job to prevent you from entering the same person twice.
In simple words,the role of Unique constraint is to make the field or column.
The null destroys this property as database treats null as unknown
Inorder to avoid duplicates and allow null:
Make unique key as Primary key
Related
Sql two multique unique ignore when first case ocours
My table has four columns city, zipcode, number and extra. I created unique group for city, zipcode and number called unique1 and another group for city,zipcode,number and extra called unique2. Those groups need to be unique but the problem is that I can have non unique values when extra if different or is null. For example: city | zipcode | number | extra A 123 123 null A 123 123 10 (I cant add this row because of the unique groups) How can I solve this problem? (I`m using Mysql) In another words, what I need is a way to: 1) The grouping of city, zipcode and number must be unique if extra is null 2) If extra isn't null I'd like to insert that information even if the new row collides with the unique rule on '1'.
In MySQL, using unique indexes to handle data constraints beyond simple ones is not a great idea. Other, more expensive, table servers have more elaborate ways to describe constraints. Your first unique index (you called it a "group") -- unique1 -- prevents the second row in your example from being INSERTed to your table. Edit: Your example shows that you require non-unique values for your first three columns. I'm guessing a bit, but I think you should drop unique1 and just use unique2.
Drop unique1, unique2 should take care of it.
don't repeat entry row from two different table
i created two database (php using XAMPP) one for employee (id, name) and another for administrator(id, name). the id in the two tables are primary key, i need to build a relation between the two table where id don't repeat .for example :admin(1,a)uses id = 1 which should not be used in the employee table please help
The normative approach to this problem is to use a single table. That makes it very easy to keep the id values distinct. You can include a discriminator column that indicates whether a row represents an "employee" or an "administrator". In your example, there's two possible values. CREATE TABLE employee ( id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT COMMENT 'pk' , ename VARCHAR(50) NOT NULL , admin TINYINT(1) UNSIGNED NOT NULL DEFAULT '0' COMMENT 'boolean' ) Some example data, to illustrate: id ename admin --- ---------------- ------- 42 Barney Rubble 0 43 Fred Flintstone 0 17 Mr. Slate 1 Sample queries: -- select "employee" rows SELECT id, ename FROM employee WHERE admin=0 -- select "administrator" rows SELECT id, ename FROM employee WHERE admin If you need two separate tables, that you asked about Bottom line is that there is no declarative constraint available in MySQL that will enforce the id values between the two tables to be "distinct" from one another. To do that, you would have to "roll your own" solution. And that solution is not trivial, it can be rather involved. There are some solutions to simpler problems, automatically generating unique id values. But to actually enforce uniqueness, there is no simple way to do that. Is your goal to just enforce a constraint, such that INSERT and UPDATE statements will throw an error if they attempt to violate the constraint, you are going to need to write triggers.
Database many-to-many intermediate tables: extra fields
I have created a 'shops' and a 'customers' table and an intermediate table customers_shops. Every shop has a site_url web address, except that some customers use an alternative url to access the shop's site (this url is unique to a particular customer). In the intermediate table below, I have added an additional field, shop_site_url. My understanding is that this is in 2nd normalised form, as the shop_site_url field is unique to a particular customer and shop (therefore won't be duplicated for different customers/shops). Also, since it depends on customer and shop, I think this is in 3rd normalised form. I'm just not used to using the 'mapping' table (customers_shops) to contain additional fields - does the design below make sense, or should I reserve the intermediate tables purely as a to convert many-to-many relationships to one-to-one? ###### customers ###### id INT(11) NOT NULL PRIMARY KEY name VARCHAR(80) NOT NULL ###### shops ###### id INT(11) NOT NULL PRIMARY KEY site_url TEXT ###### customers_shops ###### id INT(11) NOT NULL PRIMARY KEY customer_id INT(11) NOT NULL shop_id INT(11) NOT NULL shop_site_url TEXT //added for a specific url for customer Thanks
What you are calling an "intermediate" table is not a special type of table. There is only one kind of table and the same design principles ought to be applicable to all.
Well, let's create the table, insert some sample data, and look at the results. id cust_id shop_id shop_site_url -- 1 1000 2000 NULL 2 1000 2000 http://here-an-url.com 3 1000 2000 http://there-an-url.com 4 1000 2000 http://everywhere-an-url-url.com 5 1001 2000 NULL 6 1001 2000 http://here-an-url.com 7 1001 2000 http://there-an-url.com 8 1001 2000 http://everywhere-an-url-url.com Hmm. That doesn't look good. Let's ignore the alternative URL for a minute. To create a table that resolves a m:n relationship, you need a constraint on the columns that make up the m:n relationship. create table customers_shops ( customer_id integer not null references customers (customer_id), shop_id integer not null references shops (shop_id), primary key (customer_id, shop_id) ); (I dropped the "id" column, because it tends to obscure what's going on. You can add it later, if you like.) Insert some sample data . . . then select customer_id as cust_id, shop_id from customers_shops; cust_id shop_id -- 1000 2000 1001 2000 1000 2001 1001 2001 That's closer. You should have only one row for each combination of customer and shop in this kind of table. (This is useful data even without the url.) Now what do we do about the alternative URLs? That depends on a couple of things. Do customers access the sites through only one URL, or might they use more than one? If the answer is "only one", then you can add a column to this table for the URL, and make that column unique. It's a candidate key for this table. If the answer is "more than one--at the very least the site url and the alternative url", then you need to make more decisions about constraints, because altering this table to allow multiple urls for each combination of customer and shop cuts across the grain of this requirement: the shop_site_url field is unique to a particular customer and shop (therefore won't be duplicated for different customers/shops) Essentially, I'm asking you to decide what this table means--to define the table's predicate. For example, these two different predicates lead to different table structures. customer 'n' has visited the web site for shop 'm' using url 's' customer 'n' is allowed to visit the web site for shop 'm' using alternate url 's'
Your schema does indeed make sense, as shop_site_url is an attribute of the relationship itself. You might want to give it a more meaningful name in order to distinguish it from shops.site_url.
Where else would you put this information? It's not an attribute of a shop, and it's not an attribute of a customer. You could put this in a separate table, if you wanted to avoid having a NULLable column, but you'd end up having to have a reference to your intermediate table from this new table, which probably would look even weirder to you.
Relationships can have attributes, just like entities can have attributes. Entity attributes go into columns in entity tables. Relationship attributes, at least for many-to-many relationships, go in relationship tables. It sounds as though, in general, URL is determined by the combination of shop and customer. So I would put it in the shop-customer table. The fact that many shops have only one URL suggests that there is a fifth normal form that is more subtle than this. But I'm too lazy to work it out.
Unique key with NULLs
This question requires some hypothetical background. Let's consider an employee table that has columns name, date_of_birth, title, salary, using MySQL as the RDBMS. Since if any given person has the same name and birth date as another person, they are, by definition, the same person (barring amazing coincidences where we have two people named Abraham Lincoln born on February 12, 1809), we'll put a unique key on name and date_of_birth that means "don't store the same person twice." Now consider this data: id name date_of_birth title salary 1 John Smith 1960-10-02 President 500,000 2 Jane Doe 1982-05-05 Accountant 80,000 3 Jim Johnson NULL Office Manager 40,000 4 Tim Smith 1899-04-11 Janitor 95,000 If I now try to run the following statement, it should and will fail: INSERT INTO employee (name, date_of_birth, title, salary) VALUES ('Tim Smith', '1899-04-11', 'Janitor', '95,000') If I try this one, it will succeed: INSERT INTO employee (name, title, salary) VALUES ('Jim Johnson', 'Office Manager', '40,000') And now my data will look like this: id name date_of_birth title salary 1 John Smith 1960-10-02 President 500,000 2 Jane Doe 1982-05-05 Accountant 80,000 3 Jim Johnson NULL Office Manager 40,000 4 Tim Smith 1899-04-11 Janitor 95,000 5 Jim Johnson NULL Office Manager 40,000 This is not what I want but I can't say I entirely disagree with what happened. If we talk in terms of mathematical sets, {'Tim Smith', '1899-04-11'} = {'Tim Smith', '1899-04-11'} <-- TRUE {'Tim Smith', '1899-04-11'} = {'Jane Doe', '1982-05-05'} <-- FALSE {'Tim Smith', '1899-04-11'} = {'Jim Johnson', NULL} <-- UNKNOWN {'Jim Johnson', NULL} = {'Jim Johnson', NULL} <-- UNKNOWN My guess is that MySQL says, "Since I don't know that Jim Johnson with a NULL birth date isn't already in this table, I'll add him." My question is: How can I prevent duplicates even though date_of_birth is not always known? The best I've come up with so far is to move date_of_birth to a different table. The problem with that, however, is that I might end up with, say, two cashiers with the same name, title and salary, different birth dates and no way to store them both without having duplicates.
A fundamental property of a unique key is that it must be unique. Making part of that key Nullable destroys this property. There are two possible solutions to your problem: One way, the wrong way, would be to use some magic date to represent unknown. This just gets you past the DBMS "problem" but does not solve the problem in a logical sense. Expect problems with two "John Smith" entries having unknown dates of birth. Are these guys one and the same or are they unique individuals? If you know they are different then you are back to the same old problem - your Unique Key just isn't unique. Don't even think about assigning a whole range of magic dates to represent "unknown" - this is truly the road to hell. A better way is to create an EmployeeId attribute as a surrogate key. This is just an arbitrary identifier that you assign to individuals that you know are unique. This identifier is often just an integer value. Then create an Employee table to relate the EmployeeId (unique, non-nullable key) to what you believe are the dependant attributers, in this case Name and Date of Birth (any of which may be nullable). Use the EmployeeId surrogate key everywhere that you previously used the Name/Date-of-Birth. This adds a new table to your system but solves the problem of unknown values in a robust manner.
I think MySQL does it right here. Some other databases (for example Microsoft SQL Server) treat NULL as a value that can only be inserted once into a UNIQUE column, but personally I find this to be strange and unexpected behaviour. However since this is what you want, you can use some "magic" value instead of NULL, such as a date a long time in the past
I recommend to create additional table column checksum which will contain md5 hash of name and date_of_birth. Drop unique key (name, date_of_birth) because it doesn't solve the problem. Create one unique key on checksum. ALTER TABLE employee ADD COLUMN checksum CHAR(32) NOT NULL; UPDATE employee SET checksum = MD5(CONCAT(name, IFNULL(date_of_birth, ''))); ALTER TABLE employee ADD UNIQUE (checksum); This solution creates small technical overhead, cause for every inserted pairs you need to generate hash (same thing for every search query). For further improvements you can add trigger that will generate hash for you in every insert: CREATE TRIGGER before_insert_employee BEFORE INSERT ON employee FOR EACH ROW IF new.checksum IS NULL THEN SET new.checksum = MD5(CONCAT(new.name, IFNULL(new.date_of_birth, ''))); END IF;
Your problem of not having duplicates based on name is not solvable because you do not have a natural key. Putting a fake date in for people whose date of birth is unknown will not solve your problem. John Smith born 1900/01/01 is still going to be a differnt person than John Smithh born 1960/03/09. I work with name data from large and small organizations every day and I can assure you they have two different people with the same name all the time. Sometimes with the same job title. Birthdate is no guarantee of uniqueness either, plenty of John Smiths born on the same date. Heck when we work with physicians office data we have often have two doctors with the same name, address and phone number (father and son combinations) Your best bet is to have an employee ID if you are inserting employee data to identify each employee uniquely. Then check for the uniquename in the user interface and if there are one or more matches, ask the user if he meant them and if he says no, insert the record. Then build a deupping process to fix problems if someone gets assigned two ids by accident.
There is a another way to do it. Adding a column(non-nullable) to represent the String value of date_of_birth column. The new column value would be ""(empty string) if date_of_birth is null. We name the column as date_of_birth_str and create a unique constraint employee(name, date_of_birth_str). So when two recoreds come with the same name and null date_of_birth value, the unique constraint still works. But the efforts of maintenance for the two same-meaning columns, and, the performance harm of new column, should be considered carefully.
You can add a generated column where the NULL value is replaced by an unused constant, e.g. zero. Then you can apply the unique constraint to this column: CREATE TABLE employee ( name VARCHAR(50) NOT NULL, date_of_birth DATE, uq_date_of_birth DATE AS (IFNULL(date_of_birth, '0000-00-00')) UNIQUE );
The perfect solution would be support for function based UK's, but that becomes more complex as mySQL would also then need to support function based indexes. This would prevent the need to use "fake" values in place of NULL, while also allowing developers the ability to decide how to treat NULL values in UK's. Unfortunately, mySQL doesn't currently support such functionality that I am aware of, so we're left with workarounds. CREATE TABLE employee( name CHAR(50) NOT NULL, date_of_birth DATE, title CHAR(50), UNIQUE KEY idx_name_dob (name, IFNULL(date_of_birth,'0000-00-00 00:00:00')) ); (Note the use of the IFNULL() function in the unique key definition)
I had a similar problem to this, but with a twist. In your case, every employee has a birthday, although it may be unknown. In that case, it makes logical sense for the system to assign two values for employees with unknown birthdays but otherwise identical information. NealB's accepted answer is very accurate. However, the problem I encountered was one in which the data field did not necessarily have a value. For example, if you added a 'name_of_spouse' field to your table, there wouldn't necessarily be a value for each row of the table. In that case, NealB's first bullet point (the 'wrong way') actually makes sense. In this case, a string 'None' should be inserted in the column name_of_spouse for each row in which there was no known spouse. The situation where I ran into this problem was in writing a program with database to classify IP traffic. The goal was to create a graph of IP traffic on a private network. Each packet was put into a database table with a unique connection index based on its ip source and dest, port source and dest, transport protocol, and application protocol. However, many packets simply don't have an application protocol. For example, all TCP packets without an application protocol should be classed together, and should occupy one unique entry in the connections index. This is because I want those packets to form a single edge of my graph. In this situation, I took my own advice from above, and stored a string 'None' in the application protocol field to ensure that these packets formed a unique group.
I were looking for one solution and the Alexander Yancharuk suggested was good idea for me. But in my case my columns are foreign keys and employee_id can be null. I have this structure: +----+---------+-------------+ | id | room_id | employee_id | +----+---------+-------------+ | 1 | 1 | NULL | | 2 | 2 | 1 | +----+---------+-------------+ And the room_id with employee_id NULL can not be duplicated I solved adding a trigger before insert, like this: DELIMITER $$ USE `db`$$ CREATE DEFINER=`root`#`%` TRIGGER `db`.`room_employee` BEFORE INSERT ON `room_employee` FOR EACH ROW BEGIN IF EXISTS ( SELECT room_id, employee_id FROM room_employee WHERE (NEW.room_id = room_employee.room_id AND NEW.employee_id IS NULL AND room_employee.employee_id IS NULL) ) THEN CALL `The room Can not be duplicated on room employee table`; END IF; END$$ DELIMITER ; I also added a constraint unique for room_id and employee_id
I think the fundamental question here is what you actually mean with INSERT INTO employee (name, title, salary) VALUES ('Jim Johnson', 'Office Manager', '40,000') Your own definition of a person is name AND birth date, so what does this statement mean in that context? I'd say that the solution to your problem is to prohibit inserting half identities, like the one above, by adding NOT NULL on both your name and date_of_birth columns. That way, the statement will fail and force you to enter complete identities and the unique key will do its job to prevent you from entering the same person twice.
In simple words,the role of Unique constraint is to make the field or column. The null destroys this property as database treats null as unknown Inorder to avoid duplicates and allow null: Make unique key as Primary key
DB schema for a booking system of fitness class
I need a schema for fitness class. The booking system needs to store max-number of students it can take, number of students who booked to join the class, students ids, datetime etc. A student table needs to store classes which he/she booked. But this may not need if I store students ids in class tables. I am hoping to get some good ideas. Thanks in advance.
Student: ID, Name, ... Class: ID, Name, MaxStudents, ... Student_in_Class: STUDENT_ID, CLASS_ID, DATE_ENROLL
*Not a mySql guru, I typically deal w/ MS SQL, but I think you'll get the idea. You might need to dig a little in the mySql docs to find appropriate data types that match the ones I've suggested. Also, I only gave brief explanation for some types to clarify what they're for, since this is mySql and not MS SQL. Class_Enrollment - stores the classes each student is registered for Class_Enrollment_ID INT IDENTITY PK ("identity is made specifically to serve as an id and it's a field that the system will manage on its own. It automatically gets updated when a new record is created. I would try to find something similar in mySql") Class_ID INT FK Student_ID INT FK Date_Time smalldatetime ("smalldatetime just stores the date as a smaller range of years than datetime + time up to minutes") put a unique constraint index on class_id and student_id to prevent duplicates Class - stores your classes Class_ID INT IDENTITY PK Name VARCHAR('size') UNIQUE CONSTRAINT INDEX ("UNIQUE CONSTRAINT INDEX is like a PK, but you can have more than one in a table") Max_Enrollment INT ("unless you have a different max for different sessions of the same class, then you only need to define max enrollment once per class, so it belongs in the class table, not the Class_Enrollment table") Student - stores your students Student_ID INT IDENTITY PK First_Name VARCHAR('size') Last_Name VARCHAR('size') Date_of_Birth smalldatetime ("smalldatetime can store just the date, will automatically put 0's for the time, works fine") put a unique constraint index on fname, lname, and date of birth to eliminate duplicates (you may have two John Smiths, but two John Smiths w/ exact same birth date in same database is unlikely unless it's a very large database. Otherwise, consider using first name, last name, and phone as a unique constraint)