Database design with one giant polymorphic table

Database design with one giant polymorphic table - mysql

I have a MySQL database with many different types that need a lot of the same things. For example, we have tables for reservations, customers, and accounts, and each of these needs EAV-type properties and a set of permission configurations.
My question is: Should I make the EAV and permission implementations polymorphic? So, each reservation, customer, and account gets an entity_id and entity_type_id that can be inserted to the | entity_id | entity_type_id | attribute_id | value | table? There would then be an entities table with | id | entity_type_id | that would need to be inserted whenever a new reservation, customer, or account were created.
Or should I have reservation_eav, customer_eav, accounts_eav tables? We will always know the entity type that we are looking for, so there's no need to return multiple types of entities in a single query. We will, however, need to grab multiple entities of the same type in several cases.
My reason for multiple tables is strictly performance. There's going to be a ton of reservations, not nearly as many accounts, and customers are somewhere in between. Part of me thinks this huge set of reservations will slow down lookups for accounts' / customers' AVs. However, I don't know if the performance advantages will be significant with proper indexing, and I do feel like the single, polymorphic table would make the schema simpler.

Related

Database design: same table - mixed data VS several tables - same schema

I would like to store information about people (who have a person_id) that is quite similar to each other, such as:
profession
nationality
tags
etc. = limited amount of characteristics which is not expected to grow in number
Since one person can have more than one tags (or professions for example), it makes sense to normalise the database. All these information require a simple table design: primary key (id) + varchar.
I am wondering what makes more sense:
Store mixed information in one table = one schema
Store information in distinct tables, but tables have the same schema
Edit
This information and the people are connected in a third table: primary key | person_id | property_id

1]One should store information in distinct tables having same schema, if your database is OLTP (Online transaction processing).Later you can use various joins to retrieve table data.
2]You should keep mixed information in one table if your database is for data mart/data warehouse/ data mining purpose where performance is not an issue but information related MIS is having more wheitage.

Several many-to-many relationships to one table

My database has several categories to which I want to attach user-authored text "notes". For instance, an entry in a high level table named jobs may have several notes written by the user about it, but so might a lower level entry in sub_projects. Since these notes would all be of the same format, I'm wondering if I could simplify things by having only one notes table rather than a series of tables like job_notes or project_notes, and then use multiple many-to-many relationships to link it to several other tables at once.
If this isn't a deeply flawed idea from the get go (let me know if it is!), I'm wondering what the best way to do this might be. As I see it, I could do it in two ways:
Have a many-to-many junction table for each larger category, like job_notes_mapping and project_notes_mapping, and manage the MtM relationships individually
Have a single junction table linked to either an enum or separate table for table_type, which specifies what table the MtM relationship is mapping to:
+-------------+-------------+---------------+
| note_id | table_id | table_type_id |
+-------------+-------------+---------------+
| 1 | 1 | jobs |
| 2 | 2 | jobs |
| 3 | 1 | project |
| 4 | 2 | subproject |
| ........... | ........... | ........ |
+-------------+-------------+---------------+
Forgive me if any of these are completely horrible ideas, but I thought it might be an interesting question at least conceptually.

The ideal way, IMO, would be to have a supertype of jobs, projects and subprojects - let's call it activities - on which you could define any common fact types.
For example (I'm assuming jobs, projects and subprojects form a containment hierarchy):
activities (activity PK, activity_name, begin_date, ...)
jobs (job_activity PK/FK, ...)
projects (project_activity PK/FK, job_activity FK, ...)
subprojects (subproject_activity PK/FK, project_activity FK, ...)
Unfortunately, most database schemas define unique auto-incrementing identifiers PER TABLE which makes it very difficult to implement supertyping after data has been loaded. PostgreSQL allows sequences to be reused, which is great, some other DBMSs (like MySQL) don't make it easy at all.
My second choice would be your option 1, since it allows foreign key constraints to be defined. I don't like option 2 at all.

Unfortunately, we have ended up going with the ugliest answer to this, which is to have a notes table for every different type of entry - job_notes, project_notes, and subproject_notes. Our reasons for this were as follows:
A single junction table with a column containing the "type" of junction has poor performance since none of the foreign keys are "real" and must be manually searched. This is compounded by the fact that the Notes field contains a lot of text per entry.
A junction table per entry adds an additional table over simply having separate notes tables for every table type, and while it seems slightly prettier, it does not create substantial performance gains.
I'm not satisfied with this answer, because it seems so wasteful to effectively be duplicating the same Notes table for every job/project/subproject table that is being described. However, we haven't been able to come up with an answer that would hold up performance wise in the long term. I'll leave this open in case anyone has better recommendations for how to do this!

Combined or Separate Tables

I have a current database structure that seems to split up some data for indexing purposes. The main tickets table has more "lite" fields such as foreign keys, integers, and dates, and the tickets_info table has the potentially larger data such as text fields and blobs. Is this a good idea to continue with, or should I combine the tables?
For example, the current structure looks something like this (assuming a one-to-one relationship with a foreign key on the indexes):
`tickets`
--------------------------------------------
id | customer | vendor | opened
--------------------------------------------
1 | 29 | 0 | 2013-10-09 12:49:04
`tickets_info`
--------------------------------------------
id | description | more_notes
--------------------------------------------
1 | This thing is broken! | Even longer...
My application does way more SELECTs than INSERTs/UPDATEs, so I can see the theoretical benefit of the splitting when large lists of tickets are queried at once for an overview page (250+ result listings per page). The larger details would then be used on the page that shows just the one ticket and its details with the simple JOIN (amongst the several other JOINS on the foreign keys).
The tables are currently MyISAM, but I am going to convert them to InnoDB once I restructure them if that makes any difference. There are currently about 33 columns in the tickets table and 4 columns in the tickets_info table; the tickets_info table can potentially have more columns depending on the installation ("custom fields" that I have implemented akin to PHPBBv3).

I think this design is fine. The tickets tables is used not only to show single tickets information, but also to do calculation (i.e. total of tickets sold in a specific day) and other analysis (How many tickets sold that vendor?).
Adding the tickets_info will increase the size of you tickets table without any benefits but with the risk to increase access time to the tickets table. I assume good indexing on the tickets table should keep you safe, but MySql is not a columnar database, so I expect that a row with big varchar or blog fields requires more resources.
Beside that if you use the ticket_info for single ticket queries I think you already get good performance when you query that table.
So my suggestion is leave it like it is :)

How can I best structure a link/junction table with a lot of potential columns or repeated rows?

I'm making a site that will be a subscription based service that will provide users several courses based on whatever they signed up for. A single user can register in multiple courses.
Currently the db structure is as follows:
User
------
user_id | pwd | start | end
Courses
-------
course_id | description
User_course_subscription
------------------------
user_id | course_id | start | end
course_chapters
---------------
course_id | title | description | chapter_id | url |
The concern is that with the user_course_subscription table I cannot (at least at the moment I don't know how) I can have one user with multiple course subscriptions (unless I enter the same user_id in multiple times with a different course_id each time). Alternatively I would add many columns in the format calculus_1 chem_1 etc., but that would give me a ton of columns as the list of courses grow.
I was wondering if having the user_id put in multiple times is the most optimal way to do this? Or is there another way to structure the table (or maybe I'd have to restructure all the tables)?

Your database schema looks fine. Don't worry, you're on the right track. As for the User_course_subscription table, both user_id and course_id form the primary key together. This is called a joint primary key and is basically fine.
Values are still unique because no user subscribes to the same course twice. Your business logic code should ensure this anyway. For the database part: You might want to look up in your database system's manual how to set joint primary keys up properly when creating the table (syntax might differ).
If you don't like this idea, you can also create a pseudo primary key, that is having:
user_course_subscription
------------------------
user_course_subscription_id | user_id | course_id | start | end
...where user_course_subscription_id is just an auto-incremented integer. This way, you can use user_course_subscription_id to identify records. This might make things easier in some places of your code, because you don't always have to use two values.
As for heaving calculus_1, chem_1 etc. - don't do this. You might want to read up on database normalization, as mike pointed out. Especially 1NF through 3NF are very common in database design.
The only reason not to follow normal forms is performance, and then again, in most cases optimization is premature. If you're concerned, stress-test the prototype of your appliation under realistic (expected) conditions and measure response times to get some hard evidence.

I don't know what's the meaning of the start and end columns in the user table. But you seem to have no redundancy.
You should check out the boyce-codd normal form wikipedia article. There is a useful example.

Design of MySQL DB to avoid having a table with mutually exclusive fields

I'm creating a new DB and I have this problem: I have two type of users that can place orders: registered users (that is, they have a login) and guest users (that is, no login). The data for registered users and guest users are different and that's why I'm thinking of using two different tables, but the orders (that share the same workflow) are all the same, so I'm thinking about using only one table.
I've read here and here (even if I don't understand fully this example) that I can enforce a MySQL rule to have mutually exclusive columns in a table (in my case they'd be "idGuest" and "idUser") but I don't like that approach.
Is there a better way to do it?

There are several approaches, which depends on the number of records and number of unique fields. For example, if you would say they differ in only two fields, I would have suggested that you just put everything in the same table.
My approach, assuming they differ a lot, would be to think "objects":
You have a main user table, and for each user type you have another table that "elaborates" that user info.
Users
-----
id,email,phone,user_type(guest or registered)
reg_users
---------
users_id, username,password etc.....
unreg_users
-----------
user_id,last_known_address, favorite_color....etc
Where user_id is foreign key to users table

Sounds like mostly a relational supertype/subtype issue. I've answered a similar question and included sample code that you should be able to adapt without much trouble. (Make sure you read the comments.)
The mildly complicating factor for you is that one subtype (guest users) could someday become a different subtype (registered users). How you'd handle that would be application-dependent. (Meaning you'd know, but probably nobody else would.)

I think I would have three tables :
A user table, that would contain :
One row for each user, no matter what type of user
The data that's present for both guests and registered
A field that indicates if a row corresponds to a registered or a guest
A guest table, that would contain :
One row per guest user,
The data that's specific to guests
And a registered table, that would contain :
One row per registered user,
The data that's specific to registered users
Then, when referencing a user (in your orders table, for example), you'd always use the id of the user table.

What you are describing is a polymorphic table. It sounds scary, but it really isn't so bad.
You can keep your separate User and Guest tables. For your Orders table, you have two columns: foreign_id and foreign_type (you can name them anything). The foreign_id is the id of the User or Guest in your case, and the content of the foreign_type is going to be either user or guest:
id | foreign_id | foreign_type | other_data
-------------------------------------------------
1 | 1 | user | ...
2 | 1 | guest | ...
To select rows for a particular user or guest, just specify the foreign_type along with the ID:
SELECT * FROM orders WHERE foreign_id = 1 AND foreign_type = 'guest';

The foreign key in the Orders table pointing back to the Customer entity that placed the order is typically a non-nullable column. If you have two different Customer tables (RegisteredCustomer and GuestCustomer) then you would requiree two separate nullable columns in the Orders table pointing back to the separate customer tables. What I would suggest is to have only one Customers table, containing only those rows (EDIT: sorry, meant to write only those COLUMNS) that are common to registered users and guest users, and then a RegisteredUsers table which has a foreign-key relationship with the Customers table.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008