Modify database table best practices - mysql

As my title states, I'm curious about the best practices for modifying an existing table in a (mysql) database. In my scenario, I have a table that is already full of data and has a column named product_id that is currently the primary key for the table. I'm working on a feature where I'm finding product_id doesn't necessarily need to be unique or the primary key, since I want to allow multiple records for the same product. Database design isn't a strength of mine yet, but in my head I feel like what I would want to do is run the command DROP PRIMARY KEY for the product_id column, then add a column called id and making this the new primary key. Then I would need to update the id column for each record with a unique id for it to be a valid primary key. As far as database design is concerned, is this the best practice for doing this or is it better to create a new table with the updated structure and copying the current records into the new table?
EDIT:
More about the feature I'm working on. The products are books and I'm trying to allow multiple sections of these books to be previewed. In order to do this, I'm storing page ranges that can be previewed. Right now, only one page range is allowed, which is why the product id doesn't need to be unique anymore.

A primary key is ALWAYS unique.
Why do you don't want it to be unique? It sounds like you are exposing the key outside the database, that the PK is visible somehow and some user(s) think it should behave differently. If this is the case then this is a really bad practice.
This is the typical case of the notorious "natural keys". They are a disaster waiting to happen; I don't like big time bombs. I've been strongly opposed to them for some time now. It's good they teach them in schools so you know what not to use in the real world.
Now for the solution. If product_id is exposed, then it shouldn't be the PK at all. Solution?
Create a new column (id maybe?) that is internal, that is unique, and not exposed to the user, while keeping product_id. This new column could have the exact same value as product_id at first.
Change all FK references from other tables to the new id column.
Then, remove the PK constraint from product_id and do whatever you want to do with it.
Add the PK contraint to the new id column.

Related

Should I use my two columns that uniquely identify a record as primary key?

I started to design a database that tracks system events by following some online tutorials, and some easy examples start by assigning auto-incrementing IDs as primary keys. I looked at my database, I don't really need IDs. Out of all my columns, the timestamp and device ID are the two columns that together identifies an unique event.
What my program does right now is to pull some events from system log in the past x minutes and insert these events to the database. However, I could be going too much into the past that the events overlap with what's already in the database. As I mentioned before, timestamp and device ID are the two fields that uniquely identify an event. My question is, should I use these two fields as my primary key and use "Insert ignore" from now on so I can avoid having duplicate records?
It is a good practise to never have your business values as table's primary key and always to use synthetic, e.g. autoincrement, values for this. You will make your life easier in the future when business requirements change :)
We are currently struggling with exactly this situation. Have a column with business values as a primary key for 2 years and now painfully introducing an autoincrement one.
You may need to use foreign key from other table to this in the future to link some rows between two tables. It is easier with one-column primary key.
But if you don't need it now - no need to create column special for index. Table can be altered in future to add such column with autoincrement and move primary key to it.

How can I join two tables with two different primary keys into another table?

I have two tables: students and courses, assuming that each student can be in more than one course and that each course can have more than one student.
[Table Students] [Table Courses]
id(PK) id(PK)
name name
age duration
etc... etc...
and what I want to do it is to relate both tables into another table, for example, studying, in which I will store the course or courses that is doing each student. Like this:
[Table studying]
idStudent
idCourse
What I have deduced
I think that idStudent and idCourse should be foreign keys because the information it is stored in students and courses respectively with an unique primary key and to respect the consistency of the database. It cannot exist a relation without information neither of the student nor the course or just without the information of one of them.
I also know that some tables has two primary keys to allow that in the table could exist more than one repeated value of a primary key, but not of both primary keys at the same time.
My questions
These ids (idStudent, idCourse). Have to be primary keys or foreign keys?
Should the table studying has another column with an ID?
Is my deduction in the good way?
P.S: I do not need sql statements, I just need help to clarify my confusion.
Thanks in advance!
These ids (idStudent, idCourse). Have to be primary keys or foreign keys?
You want them to be foreign keys, because the existence of each record on your third table depends on the availability of the first, that is, there cannot be a "Student Course" or a "Course with Students" without either the course or the student. It could (if you don't make those keys) but you would break referential integrity
On the other hand, having FK's is usually a good thing because you make sure that you don't remove dependable records by mistake (which is what the constraint is for on the first place) unless you did something like cascade deleting
Should the table studying has another column with an ID?
No, it does not have to but again, sometimes it is a good practice because some software like Object Relational Mappers, Diagram Software, etc. may rely on the fact that they always needs a by-convention primary key. Some others don't even support composite keys so while it is not mandatory it can help in the future and it does not hurt. Of course this all depends on what you are using the database for and how (pure SQL, which engine you use, if you use it with a framework etc.)
Is my deduction in the good way?
All is relative. But I think your logic is good. My advice is that you always design your data schemas as flexible as you can because if a project grows its harder (and more costly) to do those changes down the road. Invest time on thinking how you may expand your application functionality and think if the schema will adapt to it.
Your deduction is correct.
In fact, you should have a composite primary key consisting of both (idStudent, idCourse) columns, because this tuple is the identifier of row in the table, you do not need additional ID column (of course, you can also take that approach to add additional ID column that would be your primary key, but you do not need it if one student can have one course assigned only once)
To respect the integrity, both columns (separately) should be foreign keys - idStudent should be referencing id column of Students table and idCourse should reference id column of Courses table.
If you like you can make them primary keys on studying table. But this is unnecesary, because relation (role of studying table) is many to many and this kind of table dont need primary keys. You need to know that also when you make them pk (pair of student id and course id) , thats mean that theee could be only one pair of each, thats equivalent to constrain unique - student can take a course only ones. In the future you maybe would like to add to this table start_date and this kind of pk could be a problem, you will need to modify them.

Renaming foreign keys to fit the context of a table

When using a foreign key in a table, is it good form to change the name of the key for that table to make it clear what function the key performs in the table, or is it good form to retain the original name, to make it clear that it is a foreign key?
Example:
a table keeps track of users, the primary key is user_id
a second table stores articles on the website and keeps track of the author with the foreign key user_id.
In the context of the second table it would make more sense to call the foreign key author. In the context of the whole database it would make more sense to call the foreign key user_id
Is there a general convention that deals with this situation, or is that what comments are for?
Well, if you have a movie table you wouldn't want columns called person_id and person_id, but rather producer and director, or perhaps producer_id and director_id, or maybe producer_person_id and director_person_id.
I know movies can have multiple directors and multiple producers; this was just an example. Any case in which a table has two foreign keys to the same table will show you that you cannot in principle stick completely to a convention of using only the table name in the column name. You can use both (as in the producer_person_id example) but that leads to long column names.
Don't use comments. No one reads them. Okay that was just snark, perhaps, but in general favor descriptive names to comments!
Aside from the two-foreign-key issue, I'm not really aware of any univerally accepted convention.
It is conventional to know the database schema's modelling and designing. Whatever makes sense to the database administrator. Business logic is not concerned with how the database is named, only the results. For the database administrator if it make more sense to rename the foreign key author_id to refer to user_id of another table then do so and notate it in some documents that T2.author_id must exist in T1.user_id. When transitioning from modelling to designing the database (which is where you are now) it would make sense to just keep it simple, but you can change the foreign key names so long as you can remember them (and document them as well).

simple DB design question

This is probably a very stupid question, but I am just not sure which solution is the most elegant and the best(most performant) way to go in the following scenario.
I have the following tables:
Customer, Company, Meter, Reading
all of the tables above the line are supposed to be linked to one or more records of a "Comment" table. Which is the best way to model this relationship?
I am seeing two solutions here:
1.) use m:n relationships: CustomerComment, CompanyComment, etc. -> easy to extend later on, but a lot of new tables
2.) use 1:n relationships: Comment table has a field for the PK of the tables above (Customer_id, Company_id, ...) -> minimal table approach, but "harder" to extend since I would have to add a new field to the comment table whenever there is a new table that needs to be have comments
The target is a modular application, which may or may not have any of those four tables.
Which one is the better one - or are there more?
Thanks!
This is the problem with using integers for primary keys. You have a few solutions you can use.
The true unique ID for any given row for Customer, Company, Meter, Reading is a UUID. Maybe because of the database design the primary key has to be an integer but that is ok. This means you never have to add fields to the COMMENTS table if you have a new type in your system. It will always reference by the types ID.
Your tables can look like this:
CUSTOMER
ID UUID
COMPANY
ID UUID
METER
ID UUID
COMMENTS
ID
RELATED_TO UUID
COMMENT TEXT
Now you can attach comments to any table that has a unique ID.
If you want to support referential constraints
OBJECT is a table that holds all of the ID's of all the pieces of data you have in your system. We really start building a system in which you can associate any comment with anything you want. This may not be suitable in your design.
OBJECT
ID UUID
CUSTOMER
ID UUID
FOREIGN_KEY (ID) REFERENCES OBJECT(ID) ON DELETE CASCADE
COMPANY
ID UUID
FOREIGN_KEY (ID) REFERENCES OBJECT(ID) ON DELETE CASCADE
METER
ID UUID
FOREIGN_KEY (ID) REFERENCES OBJECT(ID) ON DELETE CASCADE
COMMENTS
ID
RELATED_TO UUID
COMMENT TEXT
FOREIGN_KEY (RELATED_TO) REFERENCES OBJECT(ID) ON DELETE CASCADE
This complicates the design in order to assure you don't need to add 2 tables for each new type in the system. Each design has sacrifices. In this one you've complicated things by saying for every each entry whether it be Company, Customer, Meter I need an associated ID int he Object table so I can put a foreign key on it.
I prefer one for each pair - CustomerComment, CompanyComment, etc. It eventually will speed up your queries, and while it isn't as 'extensible' as a single CommentLink table, you'll need to make schema changes when you add something else that needs comments anyway.
I would use seperate tables, that way you can keep the referential constraints simple.

Different database tables joining on single table

So imagine you have multiple tables in your database each with it's own structure and each with a PRIMARY KEY of it's own.
Now you want to have a Favorites table so that users can add items as favorites. Since there are multiple tables the first thing that comes in mind is to create one Favorites table per table:
Say you have a table called Posts with PRIMARY KEY (post_id) and you create a Post_Favorites with PRIMARY KEY (user_id, post_id)
This would probably be the simplest solution, but could it be possible to have one Favorites table joining across multiple tables?
I've though of the following as a possible solution:
Create a new table called Master with primary key (master_id). Add triggers on all tables in your database on insert, to generate a new master_id and write it along the row in your table. Also let's consider that we also write in the Master table, where the master_id has been used (on which table)
Now you can have one Favorites table with PRIMARY KEY (user_id, master_id)
You can select the Favorites table and join with each individual table on the master_id and get the the favorites per table. But would it be possible to get all the favorites with one query (maybe not a query, but a stored procedure?)
Do you think that this is a stupid approach? Since you will perform one query per table what are you gaining by having a single table?
What are your thoughts on the matter?
One way wold be to sub-type all possible tables to a generic super-type (Entity) and than link user preferences to that super-type. For example:
I think you're on the right track, but a table-based inheritance approach would be great here:
Create a table master_ids, with just one column: an int-identity primary key field called master_id.
On your other tables, (users as an example), change the user_id column from being an int-identity primary key to being just an int primary key. Next, make user_id a foreign key to master_ids.master_id.
This largely preserves data integrity. The only place you can trip up is if you have a master_id = 1, and with a user_id = 1 and a post_id = 1. For a given master_id, you should have only one entry across all tables. In this scenario you have no way of knowing whether master_id 1 refers to the user or to the post. A way to make sure this doesn't happen is to add a second column to the master_ids table, a type_id column. Type_id 1 can refer to users, type_id 2 can refer to posts, etc.. Then you are pretty much good.
Code "gymnastics" may be a bit necessary for inserts. If you're using a good ORM, it shouldn't be a problem. If not, stored procs for inserts are the way to go. But you're having your cake and eating it too.
I'm not sure I really understand the alternative you propose.
But in general, when given the choice of 1) "more tables" or 2) "a mega-table supported by a bunch of fancy code work" ..your interests are best served by more tables without the code gymnastics.
A Red Flag was "Add triggers on all tables in your database" each trigger fire is a performance hit of it's own.
The database designers have built in all kinds of technology to optimize tables/indexes, much of it behind the scenes without you knowing it. Just sit back and enjoy the ride.
Try these for inspiration Database Answers ..no affiliation to me.
An alternative to your approach might be to have the favorites table as user_id, object_id, object_type. When inserting in the favorites table just insert the type of the favorite. However i dont see a simple query being able to work with your approach or mine. One way to go about it might be to use UNION and get one combined resultset and then identify what type of record it is based on the type. Another thing you can do is, turn the UNION query into a MySQL VIEW and simply query that VIEW.
The benefit of using a single table for favorites is a simplicity, which some might consider as against the database normalization rules. But on the upside, you dont have to create so many favorites table and you can add anything to favorites easily by just coming up with a new object_type identifier.
It sounds like you have an is-a type relationship that needs to be modeled. All of the items that can be favourited are a type of "item". It sounds like you are on the right track, but I wouldn't use triggers. What could be the right answer if I have understood correctly, is to pull all the common fields into a single table called items (master is a poor name, master of what?), this should include all the common data that would be needed when you need a users favourite items, I'd expect this to include fields like item_id (primary key), item_type and human_readable_name and maybe some metadata about when the item was created, modified etc. Each of your specific item types would have its own table containing data specific to that item type with an item_id field that has a foreign key relationship to the item table. Then you'd wrap each item type in its own insertion, update and selection SPs (i.e. InsertItemCheese, UpdateItemMonkey, SelectItemCarKeys). The favourites table would then work as you describe, but you only need to select from the item table. If your app needs the specific data for each item type, it would have to be queried for each item (caching is your friend here).
If MySQL supports SPs with multiple result sets you could write one that outputs all the items as a result set, then a result set for each item type if you need all the specific item data in one go. For most cases I would not expect you to need all the data all the time.
Keep in mind that not EVERY use of a PK column needs a constraint. For example a logging table. Even though a logging table has a copy of the PK column from the table being logged, you can't build a constraint.
What would be the worst possible case. You insert a record for Oprah's TV show into the favorites table and then next year you delete the Oprah Show from the list of TV shows but don't delete that ID from the Favorites table? Will that break anything? Probably not. When you join favorites to TV shows that record will fall out of the result set.
There are a couple of ways to share values for PK's. Oracle has the advantage of sequences. If you don't have those you can add a "Step" to your Autonumber fields. There's always a risk though.
Say you think you'll never have more than 10 tables of "things which could be favored" Then start your PK's at 0 for the first table increment by 10, 1 for the second table increment by 10, 2 for the third... and so on. That will guarantee that all the values will be unique across those 10 tables. The risk is that a future requirement will add table 11. You can always 'pad' your guestimate