MySQL Database Normalization .. one table to connect multiple others? - mysql

Let's assume I have a very large database with tons of tables in it.
Certain of these tables contain datasets to be connected to each other like
table: album
table: artist
--> connected by table: album_artist
table: company
table: product
--> connected by table: company_product
The tables album_artist and company_product contain 3 columns representing primary key, albumID/artistID meanwhile companyID/productID...
Is it a good practice to do something like an "assoc" table which is made up like
---------------------------------------------------------
| id int(11) primary | leftID | assocType | rightID |
|---------------------------------------------------------|
| 1 | 10 | company:product | 4 |
| 2 | 6 | company:product | 5 |
| 3 | 4 | album:artist | 10 |
---------------------------------------------------------
I'm not sure if this is the way to go or if there's anything else than creating multiple connection tables?!

No, it is not a good practice. It is a terrible practice, because referential integrity goes out the window. Referential integrity is the guarantee provided by the RDBMS that a foreign key in one row refers to a valid row in another table. In order for the database to be able to enforce referential integrity, each referring column must refer to one and only one referred column of one and only one referred table.

No, no, a thousand times no. Don't overthink your many-to-many relationships. Just keep them simple. There's nothing to gain and a lot to lose by trying to consolidate all your relationships in a single table.
If you have a many to many relationship between, say guiarist and drummer, then you need a guitarist_drummer table with two columns in it: guitarist_id and drummer_id. That table's primary key should be comprised of both columns. And you should have another index that's made of the two columns in the opposite order. Don't add a third column with an autoincrmenting id to those join tables. That's a waste, and it allows duplicated pairs in those tables, which is generally confusing.
People who took the RDBMS class in school will immediately recognize how these tables work. That's good, because it means you don't have to be the only programmer on this project for the rest of your life.
Pro tip: Use the same column name everywhere. Make your guitarist table contain a primary key called guitarist_id rather than id. It makes your relationship tables easier to understand. And, if you use a reverse engineering tool like Sql Developer that tool will have an easier time with your schema.

The answer is that it "depends" on the situation. In your case and most others, no, it does not make sense. It does make sense if you are doing a many <-> many relationship, the constraints can be enforced by the link table with foreign keys and a unique constraint. Probably the best use case would be if you had numerous tables pointing to a single table. Each table could have a link table with indexes on it. This would be beneficial if one of the tables is a large table, and you need to fetch the linked records separately.

Related

How do I make a combination of values unique in a mySQL table no mater what order they are in?

I am creating a application that involves a friend system such as the one in facebook. The way I structured this in my SQL database is by having a friend table which has the columns ID, accountID1, accountID2 so that the each of the two accounts involved in the friendship is noted. The problem is that a friendship can be noted in two different ways for example:
ID | accountID1 | accountID2
1 | 1 | 2
2 | 2 | 1
If I make the combination unique it does not protect against this from occurring. How can I create a constraint in MySQL to prevent a friendship to be present in two different ways to ensure data integrity? or is there a different way of storing this information to prevent such problems in the first place?
The final solution I used is to first of all get rid of the ID for the friends table and make a composite primary key out of the two account ID's PrimaryKey(accountID0, accountID1). This ensures that the combination of them are unique. Then I created a "before Insert trigger" to switch the values so that the smaller accountID is always in accountID0. This method worked perfectly and made no problems so far.

Several many-to-many relationships to one table

My database has several categories to which I want to attach user-authored text "notes". For instance, an entry in a high level table named jobs may have several notes written by the user about it, but so might a lower level entry in sub_projects. Since these notes would all be of the same format, I'm wondering if I could simplify things by having only one notes table rather than a series of tables like job_notes or project_notes, and then use multiple many-to-many relationships to link it to several other tables at once.
If this isn't a deeply flawed idea from the get go (let me know if it is!), I'm wondering what the best way to do this might be. As I see it, I could do it in two ways:
Have a many-to-many junction table for each larger category, like job_notes_mapping and project_notes_mapping, and manage the MtM relationships individually
Have a single junction table linked to either an enum or separate table for table_type, which specifies what table the MtM relationship is mapping to:
+-------------+-------------+---------------+
| note_id | table_id | table_type_id |
+-------------+-------------+---------------+
| 1 | 1 | jobs |
| 2 | 2 | jobs |
| 3 | 1 | project |
| 4 | 2 | subproject |
| ........... | ........... | ........ |
+-------------+-------------+---------------+
Forgive me if any of these are completely horrible ideas, but I thought it might be an interesting question at least conceptually.
The ideal way, IMO, would be to have a supertype of jobs, projects and subprojects - let's call it activities - on which you could define any common fact types.
For example (I'm assuming jobs, projects and subprojects form a containment hierarchy):
activities (activity PK, activity_name, begin_date, ...)
jobs (job_activity PK/FK, ...)
projects (project_activity PK/FK, job_activity FK, ...)
subprojects (subproject_activity PK/FK, project_activity FK, ...)
Unfortunately, most database schemas define unique auto-incrementing identifiers PER TABLE which makes it very difficult to implement supertyping after data has been loaded. PostgreSQL allows sequences to be reused, which is great, some other DBMSs (like MySQL) don't make it easy at all.
My second choice would be your option 1, since it allows foreign key constraints to be defined. I don't like option 2 at all.
Unfortunately, we have ended up going with the ugliest answer to this, which is to have a notes table for every different type of entry - job_notes, project_notes, and subproject_notes. Our reasons for this were as follows:
A single junction table with a column containing the "type" of junction has poor performance since none of the foreign keys are "real" and must be manually searched. This is compounded by the fact that the Notes field contains a lot of text per entry.
A junction table per entry adds an additional table over simply having separate notes tables for every table type, and while it seems slightly prettier, it does not create substantial performance gains.
I'm not satisfied with this answer, because it seems so wasteful to effectively be duplicating the same Notes table for every job/project/subproject table that is being described. However, we haven't been able to come up with an answer that would hold up performance wise in the long term. I'll leave this open in case anyone has better recommendations for how to do this!

How can I best structure a link/junction table with a lot of potential columns or repeated rows?

I'm making a site that will be a subscription based service that will provide users several courses based on whatever they signed up for. A single user can register in multiple courses.
Currently the db structure is as follows:
User
------
user_id | pwd | start | end
Courses
-------
course_id | description
User_course_subscription
------------------------
user_id | course_id | start | end
course_chapters
---------------
course_id | title | description | chapter_id | url |
The concern is that with the user_course_subscription table I cannot (at least at the moment I don't know how) I can have one user with multiple course subscriptions (unless I enter the same user_id in multiple times with a different course_id each time). Alternatively I would add many columns in the format calculus_1 chem_1 etc., but that would give me a ton of columns as the list of courses grow.
I was wondering if having the user_id put in multiple times is the most optimal way to do this? Or is there another way to structure the table (or maybe I'd have to restructure all the tables)?
Your database schema looks fine. Don't worry, you're on the right track. As for the User_course_subscription table, both user_id and course_id form the primary key together. This is called a joint primary key and is basically fine.
Values are still unique because no user subscribes to the same course twice. Your business logic code should ensure this anyway. For the database part: You might want to look up in your database system's manual how to set joint primary keys up properly when creating the table (syntax might differ).
If you don't like this idea, you can also create a pseudo primary key, that is having:
user_course_subscription
------------------------
user_course_subscription_id | user_id | course_id | start | end
...where user_course_subscription_id is just an auto-incremented integer. This way, you can use user_course_subscription_id to identify records. This might make things easier in some places of your code, because you don't always have to use two values.
As for heaving calculus_1, chem_1 etc. - don't do this. You might want to read up on database normalization, as mike pointed out. Especially 1NF through 3NF are very common in database design.
The only reason not to follow normal forms is performance, and then again, in most cases optimization is premature. If you're concerned, stress-test the prototype of your appliation under realistic (expected) conditions and measure response times to get some hard evidence.
I don't know what's the meaning of the start and end columns in the user table. But you seem to have no redundancy.
You should check out the boyce-codd normal form wikipedia article. There is a useful example.

Database Design: need unique rows + relationships

Say I have the following table:
TABLE: product
============================================================
| product_id | name | invoice_price | msrp |
------------------------------------------------------------
| 1 | Widget 1 | 10.00 | 15.00 |
------------------------------------------------------------
| 2 | Widget 2 | 8.00 | 12.00 |
------------------------------------------------------------
In this model, product_id is the PK and is referenced by a number of other tables.
I have a requirement that each row be unique. In the example about, a row is defined to be the name, invoice_price, and msrp columns. (Different tables may have varying definitions of which columns define a "row".)
QUESTIONS:
In the example above, should I make name, invoice_price, and msrp a composite key to guarantee uniqueness of each row?
If the answer to #1 is "yes", this would mean that the current PK, product_id, would not be defined as a key; rather, it would be just an auto-incrementing column. Would that be enough for other tables to use to create relationships to specific rows in the product table?
Note that in some cases, the table may have 10 or more columns that need to be unique. That'll be a lot of columns defining a composite key! Is that a bad thing?
I'm trying to decide if I should try to enforce such uniqueness in the database tier or the application tier. I feel I should do this in the database level, but I am concerned that there may be unintended side effects of using a non-key as a FK or having so many columns define a composite key.
When you have a lot of columns that you need to create a unique key across, create your own "key" using the data from the columns as the source. This would mean creating the key in the application layer, but the database would "enforce" the uniqueness. A simple method would be to use the md5 hash of all the sets of data for the record as your unique key. Then you just have a single piece of data you need to use in relations.
md5 is not guaranteed to be unique, but it may be good enough for your needs.
First off, your intuition to do it in the DB layer is correct if you can do it easily. This means even if your application logic changes, your DB constraints are still valid, lowering the chance of bugs.
But, are you sure you want uniqueness on that? I could easily see the same widget having different prices, say for sale items or what not.
I would recommend against enforcing uniqueness unless there's a real reason to.
You might have something like this (obvoiusly, don't use * in production code)
# get the lowest price for an item that's currently active
select *
from product p
where p.name = "widget 1" # a non-primary index on product.name would be advised
and p.active
order-by sale_price ascending
limit 1
You can define composite primary keys and also unique indexes. As long as your requirement is met, defining composite unique keys is not a bad design. Clearly, the more columns you add, the slower the process of updating the keys and searching the keys, but if the business requirement needs this, I don't think it is a negative as they have very optimized routines to do these.

Conditional Foreign Key to multiple tables

I have a table which contains two type of data, either for Company or Employee.
Identifying that data by either 'C' or 'E' & a column storing primary key of it.
So how can I give foreign key depending on data contained & maintain referential integrity dynamically.
id | referenceid | documenttype
-------------------------------
1 | 12 | E
2 | 7 | C
Now row with id 1 should reference Employee table with pk 12 & row with id 2 should reference Company table with pk 7.
Otherwise I have to make two different tables for both.
Is there any other way to accomplish it.
If you really want to do this, you can have two nullable columns one for CompanyId and one for EmployeeId that act as foreign keys.
But I would rather you to try and review the database schema design.
It would be better to normalize the table - Creating separate tables for Company and Employee. You would also get better performance after normalization. Sincec the Company and Employee are separate entities, its better not to overlap them.
Personally, i would go with the two different table option.
Employee / Company seem to be distinct enough for me not to want to store their data together.
That will make the foreign key references also straight forward.
However, if you do want to still store it in one table, one way of maintaining the referential integrity would be through a trigger.
Have an Insert / Update trigger that checks the appropriate value in Company Master / Employee master depending on the value of column containing 'C' / 'E'
Personally, i would prefer avoiding such logic as triggers are notoriously hard to debug.