This is a question about database design. Say I have several tables, some of which each have a common expiry field.
CREATE TABLE item (
id INT PRIMARY KEY
)
CREATE TABLE coupon (
id INT PRIMARY KEY FOREIGN KEY (`item.id`),
expiry DATE NOT NULL
)
CREATE TABLE subscription (
id INT PRIMARY KEY FOREIGN KEY (`item.id`),
expiry DATE NOT NULL
)
CREATE TABLE product(
id INT PRIMARY KEY FOREIGN KEY (`item.id`),
name VARCHAR(32)
)
The expiry column does need to be indexed so I can easily query by expiry.
My question is, should I pull the expiry column into another table like so?
CREATE TABLE item (
id INT PRIMARY KEY
)
CREATE TABLE expiry(
id INT PRIMARY KEY,
expiry DATE NOT NULL
)
CREATE TABLE coupon (
id INT PRIMARY KEY FOREIGN KEY (`item.id`),
expiry_id INT NOT NULL FOREIGN KEY(`expiry.id`)
)
CREATE TABLE subscription (
id INT PRIMARY KEY FOREIGN KEY (`item.id`),
expiry_id INT NOT NULL FOREIGN KEY(`expiry.id`)
)
CREATE TABLE product(
id INT PRIMARY KEY FOREIGN KEY (`item.id`),
name VARCHAR(32)
)
Another possible solution is to pull the expiry into another base "class" table.
CREATE TABLE item (
id INT PRIMARY KEY
)
CREATE TABLE expiring_item (
id INT PRIMARY KEY FOREIGN KEY(`item.id`),
expiry DATE NOT NULL
)
CREATE TABLE coupon (
id INT PRIMARY KEY FOREIGN KEY (`expiring_item .id`),
)
CREATE TABLE subscription (
id INT PRIMARY KEY FOREIGN KEY (`expiring_item .id`),
)
CREATE TABLE product(
id INT PRIMARY KEY FOREIGN KEY (`item.id`),
name VARCHAR(32)
)
Given the nature of databases in that refactoring the table structure is difficult once they are being used, I am having trouble weighing the pros and cons of each approach.
From what I see, the first approach uses the least number of table joins, however, I will have redundant data for each expiring item. The second approach seems good, in that any time I need to add an expiry to an item I simply add a foreign key to that table. But, if I discover expiring items (or a subset of expiring items) actually share another attribute then I need to add another table for that. I like the third approach best, because it brings me closest to an OOP like hierarchy. However, I worry that is my personal bias towards OOP programming, and database tables do not use composition in the same way OOP class inheritance does.
Sorry for the poor SQL syntax ahead of time.
I would stick with the first design as 'redundant' data is still valid data if only as a record of what was valid at a point in time and it also allows for renewal with minimum impact. Also the second option makes no great sense as the expiry is an arbritrary item that has no real context outside of the table referencing, in other words unless it is associated with a coupon or a subscription it is an orphan value. Finally the third option makes no more sense in that at what point does a item become expiring? as soon as it is defined? at a set period before expiry...at the end of the day the expiry is an distinct attribute which happens to have the same name and purpose for both the coupon and the subscription but which isn't related to each other or as such the item.
Do not normalize "continuous" values such as datetime, float, int, etc. It makes it very inefficient to do any kind of range test on expiry.
Anyway, a DATE takes 3 bytes; an INT takes 4, so the change would increase the disk footprint for no good reason.
So, use the first, not the second. But...
As for the third, you say "expirations are independent", yet you propose having a single expiry?? Which is it??
If they are not independent, then another principle comes into play. "Don't have redundant data in a database." So, if the same expiry really applies to multiple connected tables, it should be in only one of the tables. Then the third schema is the best. (Exception: There may be a performance issue, but I doubt it.)
If there are different dates for coupon/subscription/etc, then you must not use the third.
Related
I am learning SQL and going trough some lab exercises when i got to a question that asks to create table from a physical schema. No problem there, simple enough to create a table, but i got it wrong because i didn't use the NOT NULL, NULL, and FKconstraints. So what in this schema tells me what constrains to use? Here is the correct answer according to the exercise. (the auto increment was provided in the question)
CREATE TABLE customerorder (DonutOrderID INT(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
CustomerID INT(11) NOT NULL,
DonutOrderTimestamp TIMESTAMP DEFAULT NOW(),
SpecialNotes VARCHAR(500) NULL,
FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID));
A customerorder without a customer makes no sense, so its customer ID column should not be nullable.
A customerorder must refer to an existing customer, so you should make the customer ID a foreign key to the customer table.
SpecialNotes are only special when they are optional in my opinion, so the should be nullable.
If the table is called customerorder, its ID should not be called DonutOrderID, as this name looks somewhat unrelated and I'd expect some additional DonutOrder table in the database. The customerorder's ID should be called id or customerorder_id or the like instead.
As a customer order seems to be a donat order in that database, the DonutOrderTimestamp should probably be obligatory (i.e. not nullable), as every order is placed at some point in time.
I have three objetcs per se, Clients, Products and Orders.
Clients is set up with its own values as are the products.
The problem arises when I need to set up a table for the orders since though it only has one client, therefore a one-way relationship is done easily, I cant think of how to make the list of products within the order (which is of a variable size).
Eg case:
Client table has following fields:ID,Name
Product table has following fields: ID,Name,Price
Now in order to create a table for orders I have this problem:
Order:
Id = 001
Client_ID = 002
(linked to client table)
Products = array? eg. ["milk","tomatoes","Thin_Crust Ham & Cheese Pizza no_Gluten"] (would use their ID this is just to visualize it)
When I first searched for this the most common answer was to create another table.
From what I have seen creating another table is not really possible since in those examples they are unique within the newly created table (eg. someone wanted to create a field to store multiple phone numbers for one person within the "person" table, so they can create a table of telf.numbers since they are unique and links them back to the "person" in question.)
The other option I have seen is just using a large varchar field with commas in between values.
If this is the only other way of doing so would there not be a problem if we reach the char limit per field?
This is a very common scenario in database design, you are looking to create a n:m (Many to Many) relationship between the order and the product. This can be achieved with a linking table.
you could use a comma-delimited string, JSON, XML or other serialization method to store this data in a single string column, but that complicates the querying of your data and you lose some of the power that using an RDBMS gives you.
Other RDBMS allow VARCHAR(MAX) which alleviates the field length issue when storing serialized data like this, in MySQL just set the field length to a very large number, or use the max value like VARCHAR(65535). See this topic for more help if you go down this route.
In the conceptual case of an Order, this is generally solved by adding a child table OrderItem. (or OrderLine) If you see this data in a report of a receipt, each of these items is a line on the receipt so you might see this referred to as a Line or Line Items approach. The minimum fields you need for this in your model are:
ID
Order_ID
Product_ID
Other common fields you might consider for a table like this include:
Qty: for scenarios where the user might select Extra Tomatoes, or you can simply allow multiple rows with the same Product_ID, perhaps you want both?
Cost_TaxEx: total cost of the Line Item excluding tax
Cost: total cost _including_tax.
This can be minimally represented in SQL like this:
CREATE TABLE Client (
ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
Name VARCHAR(100)
)
CREATE TABLE Product (
ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
Name VARCHAR(100),
Price DECIMAL(13,2) /* Just an assumption on length */
)
CREATE TABLE Order (
ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
Client_ID INT NOT NULL,
/* ... Insert other fields here ... */
FOREIGN KEY (Client_ID)
REFERENCES Client (ID)
)
CREATE TABLE OrderItem (
ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
Order_ID INT NOT NULL,
Product_ID INT NOT NULL,
/* ... Insert other fields here ... */
FOREIGN KEY (Order_ID)
REFERENCES Order (ID)
ON UPDATE RESTRICT ON DELETE CASCADE, /* the cascade on order:orderitem is up to you */
FOREIGN KEY (Product_ID)
REFERENCES Product (ID) /*DO NOT cascade this relationship! */
)
The above solution allows any number of Product entries in an Order but will also allow duplicate Product's, If you need to enforce only one of each product per Order, you can add a Unique Constraint to the OrderItem table:
CREATE TABLE OrderItem (
ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
Order_ID INT NOT NULL,
Product_ID INT NOT NULL,
/* ... Insert other fields here ... */
UNIQUE(Order_ID,Product_ID),
FOREIGN KEY (Order_ID)
REFERENCES Order (ID)
ON UPDATE RESTRICT ON DELETE CASCADE, /* the cascade on order:orderitem is up to you */
FOREIGN KEY (Product_ID)
REFERENCES Product (ID) /*DO NOT cascade this relationship! */
)
I am working on a data model where I need to store Employee's basic details and his rating of skillsets in MySQL database.
The number of skillsets for each employee is more than 100.
So the information I need to store is as following:
Employee ID, Name , Department , Contact info, Skillset1,Skillset2,Skillset3, ... , Skillset115
Is creating one table with approximately 120 columns is good approach?
If not, what is the best practice to deal with this kind of requirement.
No. You should have a separate table with one row per employee and per skill:
create table employeeSkills (
employeeSkillId int auto_increment primary key,
employeeId int not null,
skill varchar(255),
constraint fk_employeeSkills_employeeid foreign key (employeeId) references employees(employeeId)
);
In fact, you should really have two extra tables. The skills themselves should be stored in a separate table and the above should really be:
create table employeeSkills (
employeeSkillId int auto_increment primary key,
employeeId int not null,
skillId int,
constraint fk_employeeSkills_employeeid foreign key (employeeId) references employees(employeeId),
constraint fk_employeeSkills_skillid foreign key (skillId) references skills(skillId)
);
This type of table is called a "junction table", and is common in any properly constructed data model.
You need to create two tables that would handle the skills and the assigned skill for each employee.
This would give you a proper order in your database and also will extend your options in the future. It'll be better in search, add and assign skills to each employee. It's even more organized and would be able to be expanded easily such as adding skills category and sub-category.
The two tables schema should be something like this :
CREATE TABLE Skills (
Skill_ID INT NOT NULL AUTO_INCREMENT,
Skill_Description VARCHAR(250),
PRIMARY KEY (`Skill_ID`)
);
CREATE TABLE EmpolyeeSkills (
ES_ID INT NOT NULL AUTO_INCREMENT,
Skill_ID INT,
Employee_ID INT,
PRIMARY KEY (`ES_ID`),
CONSTRAINT FK_EMPLOYEEID FOREIGN KEY (Employee_ID) REFERENCES Employees(Employee_ID),
CONSTRAINT FK_SKILLID FOREIGN KEY (Skill_ID) REFERENCES Skills(Skill_ID)
);
The Skills table will assign an ID for each skill, and it'll be in a separate table. This will make you have a unique skills list, there won't be any redundancy. Then, you'll use EmployeeSkills to save the assigned skills on each Employee_ID. Which you can use it later on to join it with other records.
The FOREIGN KEY on Employee_ID and Skill_ID will help you in monitoring the skills between them.
The ES_ID primary key for EmpolyeeSkills will be an additional advantage that can be helpful in the future. For instance, if you want to know the latest skill that has been assigned, then your faster approach will be getting the last ES_ID as it's an AUTO_INCREMENT. This is just one advantage from tons of others.
Let's say I have a table with the following fields:
primaryEmail | secondaryEmail
I know how to create a UNIQUE constraint on one of the fields, and I know how to create a UNIQUE constraint that combines both fields, but is there a way to ensure that no value exists twice, ACROSS both columns? For example, if there's a value of joe#example.com in the primaryEmail field, I don't want it to be able to appear again in either the primaryEmail field OR the secondaryEmail field.
You might consider revising your data model and pulling the email address to another table and then relating the new and old tables together. Off the top of my head, something like this should work
create table master (
id int not null primary key,
name varchar(64)
);
create table email (
id int not null primary key,
address varchar(128) not null unique,
parent_id int not null,
type enum('prim', 'secd'),
foreign key (parent_id) references master(id)
on delete cascade,
unique (parent_id, type)
);
I don't love this design - I'm not a fan of the enum, for example - but it would solve your uniqueness constraint.
In my opinion, you would want to put two separate constraints on that field if that is really what you are trying to accomplish. What you are actually trying to do are two different things (make sure that column is unique within the record, and make sure that column within that row is also unique for the whole table).
I am writing a data warehouse, using MySQL as the back-end. I need to partition a table based on two integer IDs and a name string. I have read (parts of) the mySQL documentation regarding partitioning, and it seems the most appropriate partitioning scheme in this scenario would be either a HASH or KEY partitioning.
I have elected for a KEY partitioning because I (chicked out and) dont want to be responsible for providing a 'collision free' hashing algorithm for my fields - instead, I am relying on MySQL hashing to generate the keys required for hashing.
I have included below, a snippet of the schema of the table that I would like to partition based on the COMPOSITE of the following fields:
school id, course_id, ssname (student surname).
BTW, before anyone points out that this is not the best way to store school related information, I'll have to point out that I am only using the case below as an analogy to what I am trying to model.
My Current CREATE TABLE statement looks like this:
CREATE TABLE foobar (
id int UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
school_id int UNSIGNED NOT NULL,
course_id int UNSIGNED NOT NULL,
ssname varchar(64) NOT NULL,
/* some other fields */
FOREIGN KEY (school_id) REFERENCES school(id) ON DELETE RESTRICT ON UPDATE CASCADE,
FOREIGN KEY (course_id) REFERENCES course(id) ON DELETE RESTRICT ON UPDATE CASCADE,
INDEX idx_fb_si (school_id),
INDEX idx_fb_ci (course_id),
CONSTRAINT UNIQUE INDEX idx_fb_scs (school_id,course_id,ssname(16))
) ENGINE=innodb;
I would like to know how to modify the statement above so that the table is partitioned using the three fields I mentioned at the begining of this question (namely - school_id, course_id and the starting letter of the students surname).
Another question I would like to ask is this:
What happens in 'edge' situations for example if I attempt to insert a record that contains a valid* school_id, course_id or surname - for which no underlying partitioned table file exists - will mySQL automatically create the underlying file.?
Case in point. I have the following schools: New York Kindergaten, Belfast Elementary and the following courses: Lie Algebra in Infitesmal Dimensions, Entangled Entities
Also assume I have the following students (surnames): Bush, Blair, Hussein
When I add a new school (or course, or student), can I insert them into the foobar table (actually, I cant think why not). The reason I ask is that I forsee adding more schools and courses etc, which means that mySQL will have to create additional tables behind the scenes (as the hash will generate new keys).
I will be grateful if someone with experience in this area can confirm (preferably with links backing their assertion), that my understanding (i.e. no manual administration is required if I add new schools, courses or students to the database), is correct.
I dont know if my second question was well formed (clear) or not. If not, I will be glad to clarify further.
*VALID - by valid, I mean that it is valid in terms of not breaking referential integrity.
I doubt partitioning is as useful as you think. That said, there are a couple of other problems with what you're asking for (note: the entirety of this answer applies to MySQL 5; version 6 might be different):
columns used in KEY partitioning must be a part of the primary key. school_id, course_id and ssname are not part of the primary key.
more generally, every UNIQUE key (including the primary key) must include all columns in the partition1. This means you can only partition on the intersection of the columns in the UNIQUE keys. In your example, the intersection is empty.
most partitioning schemes (other than KEY) require integer or null values. If not NULL, ssname will not be an integer value.
foreign keys and partitioning aren't supported simultaneously2. This is a strong argument not to use partitioning.
Fortunately, collision free hashing is one thing you don't need to worry about, because partitioning is going to result in collisions (otherwise, you'd only have a single row in each partition). If you could ignore the above problems as well as the limitations on functions used in partitioning expressions, you could create a HASH partition with:
CREATE TABLE foobar (
...
) ENGINE=innodb
PARTITION BY HASH (school_id + course_id + ORD(ssname))
PARTITIONS 2
;
What should work is:
CREATE TABLE foobar (
id int UNSIGNED NOT NULL AUTO_INCREMENT,
school_id int UNSIGNED NOT NULL,
course_id int UNSIGNED NOT NULL,
ssname varchar(64) NOT NULL,
/* some other fields */
PRIMARY KEY (id, school_id, course_id),
INDEX idx_fb_si (school_id),
INDEX idx_fb_ci (course_id),
CONSTRAINT UNIQUE INDEX idx_fb_scs (school_id,course_id,ssname)
) ENGINE=innodb
PARTITION BY HASH (school_id + course_id)
PARTITIONS 2
;
or:
CREATE TABLE foobar (
id int UNSIGNED NOT NULL AUTO_INCREMENT,
school_id int UNSIGNED NOT NULL,
course_id int UNSIGNED NOT NULL,
ssname varchar(64) NOT NULL,
/* some other fields */
PRIMARY KEY (id, school_id, course_id, ssname),
INDEX idx_fb_si (school_id),
INDEX idx_fb_ci (course_id),
CONSTRAINT UNIQUE INDEX idx_fb_scs (school_id,course_id,ssname)
) ENGINE=innodb
PARTITION BY KEY (school_id, course_id, ssname)
PARTITIONS 2
;
As for the files that store tables, MySOL will create them, though it may do it when you define the table rather than when rows are inserted into it. You don't need to worry about how MySQL manages files. Remember, there are a limited number of partitions, defined when you create the table by the PARTITIONS *n* clause.