simple DB design question - mysql

This is probably a very stupid question, but I am just not sure which solution is the most elegant and the best(most performant) way to go in the following scenario.
I have the following tables:
Customer, Company, Meter, Reading
all of the tables above the line are supposed to be linked to one or more records of a "Comment" table. Which is the best way to model this relationship?
I am seeing two solutions here:
1.) use m:n relationships: CustomerComment, CompanyComment, etc. -> easy to extend later on, but a lot of new tables
2.) use 1:n relationships: Comment table has a field for the PK of the tables above (Customer_id, Company_id, ...) -> minimal table approach, but "harder" to extend since I would have to add a new field to the comment table whenever there is a new table that needs to be have comments
The target is a modular application, which may or may not have any of those four tables.
Which one is the better one - or are there more?
Thanks!

This is the problem with using integers for primary keys. You have a few solutions you can use.
The true unique ID for any given row for Customer, Company, Meter, Reading is a UUID. Maybe because of the database design the primary key has to be an integer but that is ok. This means you never have to add fields to the COMMENTS table if you have a new type in your system. It will always reference by the types ID.
Your tables can look like this:
CUSTOMER
ID UUID
COMPANY
ID UUID
METER
ID UUID
COMMENTS
ID
RELATED_TO UUID
COMMENT TEXT
Now you can attach comments to any table that has a unique ID.
If you want to support referential constraints
OBJECT is a table that holds all of the ID's of all the pieces of data you have in your system. We really start building a system in which you can associate any comment with anything you want. This may not be suitable in your design.
OBJECT
ID UUID
CUSTOMER
ID UUID
FOREIGN_KEY (ID) REFERENCES OBJECT(ID) ON DELETE CASCADE
COMPANY
ID UUID
FOREIGN_KEY (ID) REFERENCES OBJECT(ID) ON DELETE CASCADE
METER
ID UUID
FOREIGN_KEY (ID) REFERENCES OBJECT(ID) ON DELETE CASCADE
COMMENTS
ID
RELATED_TO UUID
COMMENT TEXT
FOREIGN_KEY (RELATED_TO) REFERENCES OBJECT(ID) ON DELETE CASCADE
This complicates the design in order to assure you don't need to add 2 tables for each new type in the system. Each design has sacrifices. In this one you've complicated things by saying for every each entry whether it be Company, Customer, Meter I need an associated ID int he Object table so I can put a foreign key on it.

I prefer one for each pair - CustomerComment, CompanyComment, etc. It eventually will speed up your queries, and while it isn't as 'extensible' as a single CommentLink table, you'll need to make schema changes when you add something else that needs comments anyway.

I would use seperate tables, that way you can keep the referential constraints simple.

Related

Should I be using onDelete=cascade with my foreign keys?

Related question: Foreign key constraints: When to use ON UPDATE and ON DELETE.
We'll take an example, a company table with a user table containing people from theses company
CREATE TABLE COMPANY (
company_id INT NOT NULL,
company_name VARCHAR(50),
PRIMARY KEY (company_id)
) ENGINE=INNODB;
CREATE TABLE USER (
user_id INT,
user_name VARCHAR(50),
company_id INT,
INDEX company_id_idx (company_id),
FOREIGN KEY (company_id) REFERENCES COMPANY (company_id) ON...
) ENGINE=INNODB;
ON DELETE CASCADE : dangerous : if you delete a company row in table COMPANY the engine will delete as well the related USERs. This is dangerous but can be used to make automatic cleanups on secondary tables (so it can be something you want, but quite certainly not for a COMPANY<->USER example)
Now, let's suppose that I have multiple companies, each with multiple customers. I make a habit of having a primary auto index key on each table, and using that as a foreign key on child tables.
So, since my company_id is auto generated and guaranteed to be unique, is there any danger in me setting the foreign key company_id in the users table to onDelete=cascade?
Obviously, my GUI has lots of "are you sure that you are certain that you really want to delete this? Action cannot be undone!"
But, if I don't onDelete=cascade, then before I can DELETE FROM companies WHERE company_id=X, I first have to DELETE FROM users WHERE company_id=X, which is what I have been doing until now.
I am considering onDelete=cascade for the first time & just want to be sure that I have grokked it. Deleting dependent rows can get tedious when the dependency tree is multiple levels deep.
Also, since the keys are auto index, they won;t change, so I can't see that I would need onUpdate.
[Update] One answer was concerned about deleting business data. That's just an example from a related question.
Imagine architecture: a single user can have plans of multiple sites, each with multiple buildings, each with multiple floors, each with multiple rooms.
It is a cascading, tree-like, hierarchy. Does it make sense to have onDelete=Cascade there? I think so, but want to hear from those more more knowledgeable
So much of it will depend on your specific use case. Since you are trying to delete the users anyway, and you want it to happen automatically as part of the cleanup it seems like a good candidate for using ON DELETE to me.
I probably wouldn't be deleting these records though. I would be deactivating them, setting the company to inactive. Then ON UPDATE would be a good candidate, cascading the inactive state down to all users for the company.
I would hesitate to do the delete for two reasons:
First, if the company returns, this allows you to restore the pieces you want for faster setup. And less likely to trigger a restore-from-backup if a company is incorrectly deleted.
Second, I assume that the these entities propagate out into other tables too. I wouldn't want to delete a client/suppliers order history just because we no longer have an active relationship. Even if you don't delete records from those other tables, you'll wind up orphaning the companyId/userId likely in those records.

Modify database table best practices

As my title states, I'm curious about the best practices for modifying an existing table in a (mysql) database. In my scenario, I have a table that is already full of data and has a column named product_id that is currently the primary key for the table. I'm working on a feature where I'm finding product_id doesn't necessarily need to be unique or the primary key, since I want to allow multiple records for the same product. Database design isn't a strength of mine yet, but in my head I feel like what I would want to do is run the command DROP PRIMARY KEY for the product_id column, then add a column called id and making this the new primary key. Then I would need to update the id column for each record with a unique id for it to be a valid primary key. As far as database design is concerned, is this the best practice for doing this or is it better to create a new table with the updated structure and copying the current records into the new table?
EDIT:
More about the feature I'm working on. The products are books and I'm trying to allow multiple sections of these books to be previewed. In order to do this, I'm storing page ranges that can be previewed. Right now, only one page range is allowed, which is why the product id doesn't need to be unique anymore.
A primary key is ALWAYS unique.
Why do you don't want it to be unique? It sounds like you are exposing the key outside the database, that the PK is visible somehow and some user(s) think it should behave differently. If this is the case then this is a really bad practice.
This is the typical case of the notorious "natural keys". They are a disaster waiting to happen; I don't like big time bombs. I've been strongly opposed to them for some time now. It's good they teach them in schools so you know what not to use in the real world.
Now for the solution. If product_id is exposed, then it shouldn't be the PK at all. Solution?
Create a new column (id maybe?) that is internal, that is unique, and not exposed to the user, while keeping product_id. This new column could have the exact same value as product_id at first.
Change all FK references from other tables to the new id column.
Then, remove the PK constraint from product_id and do whatever you want to do with it.
Add the PK contraint to the new id column.

How can I join two tables with two different primary keys into another table?

I have two tables: students and courses, assuming that each student can be in more than one course and that each course can have more than one student.
[Table Students] [Table Courses]
id(PK) id(PK)
name name
age duration
etc... etc...
and what I want to do it is to relate both tables into another table, for example, studying, in which I will store the course or courses that is doing each student. Like this:
[Table studying]
idStudent
idCourse
What I have deduced
I think that idStudent and idCourse should be foreign keys because the information it is stored in students and courses respectively with an unique primary key and to respect the consistency of the database. It cannot exist a relation without information neither of the student nor the course or just without the information of one of them.
I also know that some tables has two primary keys to allow that in the table could exist more than one repeated value of a primary key, but not of both primary keys at the same time.
My questions
These ids (idStudent, idCourse). Have to be primary keys or foreign keys?
Should the table studying has another column with an ID?
Is my deduction in the good way?
P.S: I do not need sql statements, I just need help to clarify my confusion.
Thanks in advance!
These ids (idStudent, idCourse). Have to be primary keys or foreign keys?
You want them to be foreign keys, because the existence of each record on your third table depends on the availability of the first, that is, there cannot be a "Student Course" or a "Course with Students" without either the course or the student. It could (if you don't make those keys) but you would break referential integrity
On the other hand, having FK's is usually a good thing because you make sure that you don't remove dependable records by mistake (which is what the constraint is for on the first place) unless you did something like cascade deleting
Should the table studying has another column with an ID?
No, it does not have to but again, sometimes it is a good practice because some software like Object Relational Mappers, Diagram Software, etc. may rely on the fact that they always needs a by-convention primary key. Some others don't even support composite keys so while it is not mandatory it can help in the future and it does not hurt. Of course this all depends on what you are using the database for and how (pure SQL, which engine you use, if you use it with a framework etc.)
Is my deduction in the good way?
All is relative. But I think your logic is good. My advice is that you always design your data schemas as flexible as you can because if a project grows its harder (and more costly) to do those changes down the road. Invest time on thinking how you may expand your application functionality and think if the schema will adapt to it.
Your deduction is correct.
In fact, you should have a composite primary key consisting of both (idStudent, idCourse) columns, because this tuple is the identifier of row in the table, you do not need additional ID column (of course, you can also take that approach to add additional ID column that would be your primary key, but you do not need it if one student can have one course assigned only once)
To respect the integrity, both columns (separately) should be foreign keys - idStudent should be referencing id column of Students table and idCourse should reference id column of Courses table.
If you like you can make them primary keys on studying table. But this is unnecesary, because relation (role of studying table) is many to many and this kind of table dont need primary keys. You need to know that also when you make them pk (pair of student id and course id) , thats mean that theee could be only one pair of each, thats equivalent to constrain unique - student can take a course only ones. In the future you maybe would like to add to this table start_date and this kind of pk could be a problem, you will need to modify them.

A many-to-many primary key for "appointments"

I know this question has been asked a lot but my example seems different.
I have two entities: Doctor and Client, and a many-to-many relationship between them to create the entity Appointment, which has, say "appointment_date_time" for an attribute.
I'm using the foreign keys from Doctor and Client to create a composite primary key in Appointment, but since there can be many appointments between the same doctor and person, should the "date_time" also be included as part of the primary key so there's no duplicates? Or would the two foreign keys be enough to query off of?
Thanks!
Your PRIMARY KEY needs to always be unique, so including the datetime would make an usable composite PRIMARY KEY that would (probably, unless you could have multiple appointments at the same time for 2 different purposes, which might happen) be unique.
However this is unlikely the best approach practically speaking, as if they move the appointment then this time will change (or maybe changed the doctor). Then you have no way to reference the appointment statically, say for example if you associated some extra data to it during creation or had to reference it as an audit entry. It also means any references to it that you do create would need to store all 3 columns.
As such I would simply look to create an auto incrementing primary key in this case, and simply index on both doctor and client for fast searches.

Renaming foreign keys to fit the context of a table

When using a foreign key in a table, is it good form to change the name of the key for that table to make it clear what function the key performs in the table, or is it good form to retain the original name, to make it clear that it is a foreign key?
Example:
a table keeps track of users, the primary key is user_id
a second table stores articles on the website and keeps track of the author with the foreign key user_id.
In the context of the second table it would make more sense to call the foreign key author. In the context of the whole database it would make more sense to call the foreign key user_id
Is there a general convention that deals with this situation, or is that what comments are for?
Well, if you have a movie table you wouldn't want columns called person_id and person_id, but rather producer and director, or perhaps producer_id and director_id, or maybe producer_person_id and director_person_id.
I know movies can have multiple directors and multiple producers; this was just an example. Any case in which a table has two foreign keys to the same table will show you that you cannot in principle stick completely to a convention of using only the table name in the column name. You can use both (as in the producer_person_id example) but that leads to long column names.
Don't use comments. No one reads them. Okay that was just snark, perhaps, but in general favor descriptive names to comments!
Aside from the two-foreign-key issue, I'm not really aware of any univerally accepted convention.
It is conventional to know the database schema's modelling and designing. Whatever makes sense to the database administrator. Business logic is not concerned with how the database is named, only the results. For the database administrator if it make more sense to rename the foreign key author_id to refer to user_id of another table then do so and notate it in some documents that T2.author_id must exist in T1.user_id. When transitioning from modelling to designing the database (which is where you are now) it would make sense to just keep it simple, but you can change the foreign key names so long as you can remember them (and document them as well).