Best approach to store 100+ columns in one table in MySQL - mysql

I am working on a data model where I need to store Employee's basic details and his rating of skillsets in MySQL database.
The number of skillsets for each employee is more than 100.
So the information I need to store is as following:
Employee ID, Name , Department , Contact info, Skillset1,Skillset2,Skillset3, ... , Skillset115
Is creating one table with approximately 120 columns is good approach?
If not, what is the best practice to deal with this kind of requirement.

No. You should have a separate table with one row per employee and per skill:
create table employeeSkills (
employeeSkillId int auto_increment primary key,
employeeId int not null,
skill varchar(255),
constraint fk_employeeSkills_employeeid foreign key (employeeId) references employees(employeeId)
);
In fact, you should really have two extra tables. The skills themselves should be stored in a separate table and the above should really be:
create table employeeSkills (
employeeSkillId int auto_increment primary key,
employeeId int not null,
skillId int,
constraint fk_employeeSkills_employeeid foreign key (employeeId) references employees(employeeId),
constraint fk_employeeSkills_skillid foreign key (skillId) references skills(skillId)
);
This type of table is called a "junction table", and is common in any properly constructed data model.

You need to create two tables that would handle the skills and the assigned skill for each employee.
This would give you a proper order in your database and also will extend your options in the future. It'll be better in search, add and assign skills to each employee. It's even more organized and would be able to be expanded easily such as adding skills category and sub-category.
The two tables schema should be something like this :
CREATE TABLE Skills (
Skill_ID INT NOT NULL AUTO_INCREMENT,
Skill_Description VARCHAR(250),
PRIMARY KEY (`Skill_ID`)
);
CREATE TABLE EmpolyeeSkills (
ES_ID INT NOT NULL AUTO_INCREMENT,
Skill_ID INT,
Employee_ID INT,
PRIMARY KEY (`ES_ID`),
CONSTRAINT FK_EMPLOYEEID FOREIGN KEY (Employee_ID) REFERENCES Employees(Employee_ID),
CONSTRAINT FK_SKILLID FOREIGN KEY (Skill_ID) REFERENCES Skills(Skill_ID)
);
The Skills table will assign an ID for each skill, and it'll be in a separate table. This will make you have a unique skills list, there won't be any redundancy. Then, you'll use EmployeeSkills to save the assigned skills on each Employee_ID. Which you can use it later on to join it with other records.
The FOREIGN KEY on Employee_ID and Skill_ID will help you in monitoring the skills between them.
The ES_ID primary key for EmpolyeeSkills will be an additional advantage that can be helpful in the future. For instance, if you want to know the latest skill that has been assigned, then your faster approach will be getting the last ES_ID as it's an AUTO_INCREMENT. This is just one advantage from tons of others.

Related

Create composite primary key with a name so I can reference it

At a workplace they recycle punchcard ids (for some strange reason). So it is common to have past employees clashing with current employees. As a workaround I want to have employee punchcard id, employee name+surname as the unique primary key (fingers crossed, perhaps add date-of-birth and even passport if available). That can be accomplished with
PRIMARY KEY (pid,name,surname).
The complication is that another table now wants to reference an employee by its above primary key.
Alas, said PK has no name! How can I reference it?
I tried these but no joy:
PRIMARY KEY id (pid, name, surname),
INDEX id (pid, name, surname),
PRIMARY KEY id,
INDEX id (pid, name, surname) PRIMARY KEY,
Can you advise on how to achieve this or even how to reference a composite primary key?
Update:
The table to store employees is em.
The table which references an employee is co (a comment made by an employee).
Ideally I would use pid (punchcard id) as the unique id of each employee. But since pids are recycled, this is not unique. And so I resorted to creating a composite key or an index which will be unique and can reference that as a unique employee id. Below are the 2 tables without the composite key. For brevity, I abbreviated table names and omitted surname etc. So the question is, how can I reference an employee whose id is composite from another table co.
CREATE TABLE em (
pid INT NOT NULL,
name VARCHAR(10) NOT NULL
);
CREATE TABLE co (
id INT primary key auto_increment,
em INT,
content VARCHAR(100) NOT NULL,
constraint co2em_em_fk foreign key (em) references em(pid)
);
If another table wants to reference this one by a composite key, you don't need it to have a name - just the list of fields will do. E.g.
CREATE TABLE other_table (
ID INT PRIMARY KEY AUTO_INCREMENT,
pid *defintion*,
name *defintion*,
surname *defintion*,
..., -- other fields, keys etc.
FOREIGN KEY (pid, name, surname) REFERENCES employees(pid, name, surname)
);
UPD: If you expect that the set of the fields inside PK might change and you can't make a simpler PK (auto-increment integer for example) for the original table, then your best bet might be something like this:
CREATE TABLE employee_key (
ID INT PRIMARY KEY AUTO_INCREMENT,
pid *defintion*,
name *defintion*,
surname *defintion*,
FOREIGN KEY (pid, name, surname) REFERENCES employees(pid, name, surname)
);
-- and then reference the employees from other tables by the key from employee_key:
CREATE TABLE other_table(
ID INT AUTO_INCREMENT PRIMARY KEY,
employee_id INT NOT NULL,
... -- other fields, indexes, etc...
FOREIGN KEY (employee_id) REFERENCES employee_key(ID)
);
Then if you have a change in employee table PK, you'll only need to update employee itself and employee_key, any other tables would stay as is.
If you CAN, however, change the original employees table, I would recommend something like this:
CREATE TABLE employees(
ID INT PRIMARY KEY AUTO_INCREMENT,
pid *defintion*,
name *defintion*,
surname *defintion*,
... -- other fields, keys, etc.
UNIQUE KEY (pid, name, surname)
);
Then you'll have to maintain the logic of generating new pid's in your code, though, or have them in some side table.
UPD2: Regarding inserts and updates.
As for inserts: you need to insert these explicitly - otherwise how would you expect the relation to be established? If you're using an ORM library to communicate with your database, then it might provide you with the methods to specify linked objects without explicitly adding the IDs, but otherwise to insert a row into employees, employee_key and other_table you need to first INSERT INTO employees(...) ;, then get perform a separate INSERT for the employee_key (knowing the key fields you've just added to employees), get the auto-generated key from employee_key and then use that to perform inserts to any other tables.
You might simplify all this by writing an AFTER INSERT trigger for employees table (that would automatically create a row in employee_key) and/or performing your inserts via a stored procedure (that will even return back the key of the newly inserted row in employee_key). But still this work needs to be done, MySQL won't do it for you by default.
Updates are a bit easier, since you can specify ON UPDATE CASCADE when adding the foreign key - in that case a change to one of the fields in the employees will automatically trigger the same change in any tables that reference employees by this key.
You would define it
CONSTRAINT id
PRIMARY KEY (pid, name, surname)
But you should read more about how MySQL uses INDEXES and how to optimize them
https://dev.mysql.com/doc/refman/8.0/en/optimization-indexes.html

How to implement a restriction to create AT MOST one null value in a column

I'm trying to implement a sort of tree in the table employee, where each employee except for one has a supervisor, who is another employee in the same table. The one with a NULL value in their supervisor slot would be the "manager" or something. Is there a way to implement it? I know in other languages this would be trivial with a partial index, but Mysql doesn't support those, so I'm at a loss.
Here's the table declaration I made without this restriction, and for the record, employee is meant to be one of four specializations of the person table, with further specialization into either associates or partners:
create table employee
(
ID numeric(7,0),
supervisor_ID numeric(7,0),
postition enum('associate','partner') NOT NULL,
primary key (ID),
foreign key (ID) references person(ID),
foreign key (supervisor_ID) references employee(ID)
);

Which database scheme is better for performance aspect?

----Scheme1----
CREATE TABLE college (
id INT AUTO_INCREMENT,
name VARCHAR(250) NOT NULL,
address VARCHAR(250),
PRIMARY KEY (id)
);
CREATE TABLE student (
college INT NOT NULL,
username VARCHAR(50) NOT NULL,
name VARCHAR(100),
FOREIGN KEY (college) REFERENCES college(id),
CONSTRAINT pk PRIMARY KEY (college,username)
);
CREATE TABLE subject (
college INT NOT NULL,
id INT NOT NULL,
name VARCHAR(100),
FOREIGN KEY (college) REFERENCES college(id),
CONSTRAINT pk PRIMARY KEY (college,id)
);
CREATE TABLE marks (
college INT NOT NULL,
student VARCHAR(50) NOT NULL,
subject INT NOT NULL,
marks INT NOT NULL,
// forget about standard for this example
FOREIGN KEY (college) REFERENCES college(id),
FOREIGN KEY (student) REFERENCES student(username),
FOREIGN KEY (subject) REFERENCES subject(id),
CONSTRAINT pk PRIMARY KEY (college,subject,student)
);
----Scheme2----
CREATE TABLE college (
id INT AUTO_INCREMENT,
name VARCHAR(250) NOT NULL,
address VARCHAR(250),
PRIMARY KEY (id)
);
CREATE TABLE student (
college INT NOT NULL,
id BIGINT NOT NULL AUTO_INCREMENT,
username VARCHAR(50) NOT NULL,
name VARCHAR(100),
FOREIGN KEY (college) REFERENCES college(id),
PRIMARY KEY (id)
);
CREATE TABLE subject (
college INT NOT NULL,
id BIGINT NOT NULL AUTO_INCREMENT,
name VARCHAR(100),
FOREIGN KEY (college) REFERENCES college(id),
PRIMARY KEY (id)
);
CREATE TABLE marks (
student VARCHAR(50) NOT NULL,
subject INT NOT NULL,
id BIGINT NOT NULL AUTO_INCREMENT,
marks INT NOT NULL,
// forget about standard for this example
FOREIGN KEY (student) REFERENCES student(id),
FOREIGN KEY (subject) REFERENCES subject(id),
PRIMARY KEY (id)
);
Looking at the above database schemes it looks like Scheme1 will give better performance while searching for the result of a specific student and faster in filtering results but it feels like it is not in all normalized forms. While Scheme2, on the other hand, looks to be fully normal but might require more JOIN operations to fetch certain results or filter the data.
Please tell me if I'm wrong about my Schemes here, also tell me which one is better?
I would go for Schema 2: when it comes to reference a table, it is easier done by using a single column (auto_incremented primary key in Schema 1) than a combination of columns (coumpound primary keys in Schema 1). Also, as commented by O.Jones, Schema 2 assumes that two students in the same college cannot have the same name, which does not seem sensible.
There are other issues with Schema 1, eg the foreign key that relates the marks to students is malformed (you would need a coumpound foreign keys that include the college id instead of just the student name).
With properly defined foreign keys referencing primary keys, performance will not be a problem; joins perform good in this situation.
But one flaw should be fixed in Schema 2, that is to store a reference to the college in the marks table. You don't need this, since a student belongs to a college (there is a reference to the college in the student table).
Also, I am unsure that a subject should belong to a college: isn't it possible that the same subject would be taught in different colleges?
Finally, I would suggest giving clearer names to the foreign key columns, like student_id instead of student, and college_id instead of college.
It's difficult to assess whether a schema is normalized without first knowing the the relationships between entities. Can a student be associated with only one college? Can a student be associated multiple times over with the same subject, getting different marks?
Declaring foreign keys maintains referential integrity but slows down insertions and updates. You can get the same functionality without declaring the fks, but you may end up with some orphaned records. The fact that a particular index is used for a fk, or not, makes no difference to SELECT query performance.
JOIN operations use indexes. So do fks. So if you have the correct indexes, your JOIN operations will be efficient. But it's impossible to know which indexes are the best without knowing your JOIN queries.
Conventionally, each table's id column comes first. And many designers name each id column after the table in which it appears, for example college.college_id rather than college.id. That makes JOIN queries slightly easier to read.
You should use a surrogate primary key in the student table (student.student_id) rather than using the student's name as part of the primary key. JOINing on id values is faster than joining on VARCHAR() values. And, some students may share names. (In the real world, peoples's dates of birth accompany their names in tables: it helps tell people apart.)
I think your marks table should contain these columns:
CREATE TABLE marks (
student_id INT NOT NULL,
subject_id INT NOT NULL,
marks INT NOT NULL,
// foreign keys as needed
PRIMARY KEY (student_id, subject_id)
);
Can a student have multiple marks for the same subject? In that case use a marks_id as the pk instead of (student_id, subject_id).

Unique constraint 'owner-owned attribute' through join table

I have the following scenario: A 'phone' child table can serve several parent tables through join tables, as follows:
CREATE TABLE phone (
id BIGINT AUTO_INCREMENT,
number VARCHAR(16) NOT NULL,
type VARCHAR(16) NOT NULL,
PRIMARY KEY(id)
);
CREATE TABLE employee_phone (
id BIGINT AUTO_INCREMENT,
employee BIGINT NOT NULL,
phone BIGINT NOT NULL,
PRIMARY KEY(id),
CONSTRAINT empl_phone_u_phone UNIQUE(phone),
CONSTRAINT empl_phone_fk_employee
FOREIGN KEY(employee)
REFERENCES employee(id) ON DELETE CASCADE,
CONSTRAINT empl_phone_fk_phone
FOREIGN KEY(phone)
REFERENCES phone(id) ON DELETE CASCADE
);
Let Alice and Bob live in the same house and be employees of the same company. HR has two phone numbers registered for Alice whereas they have only one for Bob, the number of the house's landline. Is there a way to enforce at database level that a phone number (number-type) cannot be repeated for the same employee (or supplier, or whatever parent appears later), using this configuration? Or will I have to take care of such restrictions in the application layer? I'd rather not use triggers or table denormalization (as seen in related questions on the site such as this one, which work with IDs, not with other fields), but I'm open to do so if there's no alternative. I'm using MySQL. Thanks for your attention.
If I understand correctly, you just want unique constraints on the junction tables:
alter table employee_phone add constraint unq_employeephone_employee_phone unique (employee, phone);
This will prevent duplicates for a given employee or (with the equivalent constraint) supplier.
If you want all phone numbers to be unique in the phone table, then just put a unique constraint/index on phone:
alter table phone add constraint unq_phone_phone unique (phone);
(you might want to include the type as well).
If you try to add a duplicate phone, the code will return an error.

Self-referential relationship table design: one or two tables?

CREATE TABLE Employee
(
id INT,
boss INT REFERENCES Employee(id),
PRIMARY KEY (id)
);
One employee can have many bosses and one boss can have many employees.
Does this table function the same as this two-table design?
CREATE TABLE Employee
(
id INT,
PRIMARY KEY (id)
);
Create table ManagerRelation (
id_from int NOT NULL,
id_to int NOT NULL,
PRIMARY KEY (id_from, id_to),
FOREIGN KEY (id_from) REFERENCES Employee(id),
FOREIGN KEY (id_to) REFERENCES Employee(id)
);
The second table ManagerRelation stores ids of workers who have boss-employee relationship.
My question is, are these two design right? If right, are they exactly the same functionally?
The two designs are quite different. The first requires that each employee have (at most) one boss, although each boss could have many employees.
The second allows for employees to have more than one boss.
From your description of the problem, the second form is the more appropriate data model. From my understanding of boss-employee relationships, I would expect the first to be correct.