I am working on a project where an object (a product for example) could have potentially hundreds of attributes. Objects may have different attributes as well. Because of this, among other more obvious reasons, it doesn't make sense to design a single table with hundreds of columns. It's just not scalable. In my mind, a key/value storage mechanism seems like the correct approach (specifically an Entity-Attribute-Value Model).
The other challenge with this data is it needs to be overridable. To describe this requirement, imagine a company wide retail products database that has "recommended" product attributes. But in different regions, they want to override several different attributes with their own custom values, and then some franchises in each region wants to add an additional override to be specific to their store. In the legacy system, there are multiple tables (each with an excessive amount of columns) where we use a combination of COALESCE (in a View) and code to find the most specific value based on the information we know (product, region, location, etc).
My thoughts:
// An object could be a product, a car, a
// document, etc.
---------------------------------
| Table: object
---------------------------------
| - object_id
| - object_name
---------------------------------
// An attribute could be color, length, etc.
---------------------------------
| Table: attribute
---------------------------------
| - attribute_id
| - attribute_name
---------------------------------
// An owner could be a company, a region,
// a store, etc
---------------------------------
| Table: owner
---------------------------------
| - owner_id
| - parent_owner_id
| - owner_name
---------------------------------
// Object data would be a key/value specific
// to a specific object (entity), a specific
// attribute, and specific owner (override level)
---------------------------------
| Table: objectdata
---------------------------------
| - objectdata_id
| - object_id
| - attribute_id
| - owner_id
| - value
---------------------------------
In thinking about this, it satisfies requirement #1 to have dynamic attributes that can be scaled easily. But for #2, while it provides the data necessary to figure out the overrides, it seems it would be a complex query and may have performance issues. For example, if I am viewing a specific object from the level of an owner 3 levels deep, I need to get all of the attributes defined at the top level owner where they do not have a parent, then get the attributes from each level down merging them in until I reach the specific level.
As an added bonus issue, each attribute could be a multitude of different data types (string, int, float, timestamp, etc). Do I store them all as strings and handle all validation at the application level? Hmmm.
TL;DR; So my issue (and question) is what is an effective data modeling pattern in which I can dynamically add and remove attributes from an object as well as have some sort of parent/child relationship for determining most specific attribute values based on a set of constraints?
NOTE: The retail example above is fictional, but describes the problem much better than the real situation.
Related
I need to refactor an old mysql database of products, which is divided into tables that store characteristics of different types of products.
That is to say, not all products have the same number of characteristics, and some of them influence the price of the product while others do not. So I don't see clearly if need a pivot table to manage relations between products and its features.
I can represent this as a hardware store where there are everything from screws to tools, going through some materials such as iron and wood.
For example, if we consider screws, under the name screw we have different sizes and types (for wood, for metal, millimeter thread or self-drilling), each size and type combination will determine a price, but maybe the color of the screws don't affect prices
So I was thinking of the following table structure:
products:
id | name_id | feature_id | price
product_names:
id | name
product_features:
id | name | description
But it is not clear to me how to deal with the situation that a product may have more than one feature and not all are relevant for it's price.
Thank you for any suggestion.
Based on the exchange of opinions, I arrived at this concept of a diagram. I would like to know if you see it as appropriate.
I am working with a client in manufacturing whose products are configurations of the same bunch of parts. I am creating a database that holds all valid products and their Bill of Materials. I need help on deciding a Bill Of Material schedule to implement.
The obvious solution is a many-to-many relationship with a junction table:
Table 1: Products
Table 2: Parts
Junction Table: products, parts, part quantities
However, there are multiple levels in my client's product;
-Assembly
-Sub-Assembly
-Component
-Part
and items from lower levels are allowed to be associated with any upper level item;
Assembly |Sub-assembly
Assembly |Component
Assembly |Part
Sub-Assembly |Component
Sub-Assembly |Part
Component |Part
and I suspect the client will want to add more levels in the future when new product lines are added.
Correct me if I am wrong, but I believe the above relation schedule would demand a growing integer sequence of junction tables and queries (0+1+1+2+3...) to display and export the full Bill of Materials which may eventually affect performance.
Someone suggested to put everything in one table:
Table 1: Assemblies, sub-assemblies, components, parts, etc...
Junction table: Children and Parents
This only requires one junction table to create infinite levels of many-to-many relationships. I don't know if I trust this solution, but I can't think of any issues other than accidentally making an item its own parent and creating an infinite loop and that it sounds disorganized.
I lack the experience to determine whether either or neither of these models will work for my client. I am sketching these models in MS Access, but I am open to moving this project to a more powerful platform if necessary. Any input is appreciated. Thank you.
-M
What you are describing is a hierarchy. As such it should take the form:
part_hierarchy:
part_id | parent_part_id | other | attributes | of | this | relationship
So part_id 1 may have a parent part_id 10 "component" which may have a parent_part_id (when looked up itself in this table) of 12 "Assembly. It would look like:
part_id | parent_part_id
1 | 10
10 | 12
and parts table:
part_id | description
1 | widget
10 | widget component
12 | aircraft carrier
That's a little simplified since it doesn't take into account your product/part relationship, but it will all fit together using this methodology.
Nice and simple. Now it doesn't matter how deep the hierarchy goes. It's still just two columns (And any extra columns needed for attributes of this relationship like... create_date, last_changed_by_user, etc.
I would suggest something more powerful than access though since it lacks the ability to pick a part a hierarchy using a Recursive CTE, something that comes with SQL Server, Postgres, Oracle, and the like.
I would 100% avoid any schema that requires you to add more fields or tables as the hierarchy becomes deeper and more complex. That is a path that leads towards pain and regret.
Since the level of nesting is arbitrary, use one table with a self-referencing parent_id foreign key to itself.
While this is technically correct, navigating it requires recursive query that most DB's don't support. However, a simple and effective way of making accessing nested parts simple is to store a "path" to each component, which looks like a path in a file system.
For example, say part id 1 is a top level part that has a child whose id is 2, and part id 2 has a child part with id 3, the paths would be:
id parent_id path
1 null /1
2 1 /1/2
3 2 /1/2/3
Doing this means finding the tree of subparts for any part is simply:
select b.part
from parts a
join parts b on b.path like concat(a.path, '%')
where a.id = ?
Hi I've got a small internal project I am working on. Currently it only serves my company, but I'd like to scale it so that it could serve multiple companies. The tables I have at the moment are USERS and PROJECTS. I want to start storing company specific information and relate it to the USERS table. For each user, I will have a new column that is the company they belong to.
Now I also need to store that companies templates in the database. The templates are stored as strings like this:
"divider","events","freeform" etc.
Initially I was thinking each word should go in as a separate row, but as I write this I'm thinking perhaps I should store all templates in one entry separated by commas (as written above).
Bottom line, I'm new to database design and I have no idea how to best set this up. How many tables, what columns etc. For right now, my table structure looks like this:
PROJECTS
Project Number | Title | exacttarget_id | Author | Body | Date
USERS
Name | Email | Date Created | Password
Thanks in advance for any insights you can offer.
What I would do is create 2 tables:
I would create one table for the different companies, lets call it COMPANY:
Company_id | Title | Logo | (Whatever other data you want)
I would also create one table for the settings listed above, lets call it COMPANY_SETTINGS:
Company_id | Key | Value
This gives you the flexibility in the future to add additional settings without compromising your existing code. A simple query gets all the settings, regardless of how many your current version uses.
SELECT Key, Value FROM COMPANY_SETTINGS WHERE Company_id = :companyId
Te results can then be put into an associative array for easy use throughout the project.
I having a question about same FK using in the schema. Here is the question
|=======================================|
| Book |
|=======================================|
| Book_ID (PK)| Cover_Paper | Page_Paper|
|-------------|-------------|-----------|
|====================================|
| Paper |
|====================================|
| Paper_ID (PK)| Paper_Type | weight |
|--------------|------------|--------|
Let say, I have different type of paper with different weight use to print cover and page.
So I need to plug the Paper_ID as FK into Book table. The problem is, it is wrong to have different column name as FK. If I change the table to the same column name it will be so weird.
|==========================================|
| Book |
|==========================================|
| Book_ID (PK)| Paper_ID(FK) | Paper_ID(FK)|
|-------------|--------------|-------------|
Any help on this problem??
It's not wrong to have column names that differ from the domain name of the column. In fact, it is often necessary.
The alternative - having two columns with the same name - is bad. How would you know which column indicated cover paper and which page paper? By position? This ties the meaning of the content to the physical representation of the data. What happens if I select Book_ID and just one of the Paper_ID columns? One wouldn't know, without additional external information, what the data means. Rather, that additional information should be part of the representation, so that it's as self-descriptive as possible.
In relations where each role is filled by a unique domain, it's easy enough to just use the name of the domain as the name of the role without confusion. If a book consisted of a single type of paper, talking about the book's paper makes sense. Same for a bicycle's seat and a person's nose.
However, when a relation has more than one of the same kind of thing, we need to indicate each thing's role. Distinguishing Cover_Paper and Page_Paper like you did is the right way to do it. (It's too bad SQL DBMSs don't have separate role and domain names for each column, but I digress.)
You could call it Cover_Paper_ID and Page_Paper_ID, it's sort of an industry convention to attach ID to surrogate identifier columns though I think it reads better without. In other relations, it's often sufficient to write just the role without the domain - e.g. in a Marriage we might have columns for Husband and Wife, instead of writing Husband_Person and Wife_Person.
Both Edgar Codd (author of A Relational Model of Data for Large Shared Data Banks) and Peter Chen (author of The Entity-Relationship Model - Toward a Unified View of Data) discuss roles in their papers. I highly recommend studying both, especially since very few online resources ever mention the topic.
I have a self made web application in PHP and MySQL. The many different clients using my system would like to augment entities with custom fields. Each client would like to store their own additional data, so this should be done in a dynamic way.
For example client1 would like to add the "color" property to their products, client2 want a field called "safety_level" for their products.
I want a methodology that can be applied not only for products but for users and for any other entities as well.
Here are 2 options I found the optimal, but can't decide which one is the most effective:
OPTION 1:
For every entity I make a [entityname]_customfields table in which I store the additional field values in 1:1.
e.g.:
+---------------------------------------------+
|products_custom_fields |
+---------------------------------------------+
|product_id (PK and FK related to products.id)|
|safety_level |
|some_other_fields |
+---------------------------------------------+
pro: this table can has no more records than the entity table (in this case the products) which means fewer records and it is quite easy to overview.
con: adding new fields or deleting old ones require DDL queries. I don't want to confide DDL to users...not even operators with admin permissions.
OPTION 2:
[entity]_custom_field_values will have N:1 relations to [entity] table. Each row contains the the type of the custom field and the value itself. In this case we need another table which contains the custom field types. e.g.:
custom field values:
+----------------------------------------------------------------------+
|products_custom_field_values |
+----------------------------------------------------------------------+
|custom_field_id |
|custom_field_type (FK product_custom_field_types.custom_field_type_id)|
|value |
+----------------------------------------------------------------------+
custom field types:
+---------------------------------------------------------+
|products_custom_field_types |
+---------------------------------------------------------+
|custom_field_type_id (PK AUTO_INCREMENT) |
|product_id (FK related to products.id) |
+---------------------------------------------------------+
pro: managing fields is easy, does not require to alter table structures
con: more records, all kind of custom field values in a big mess...which is not necessary wrong, because that's the point of MySQL, to extract useful data from a big mess. The question is what about efficiency and performance?
Note: this topic is actually covered in the "SQL Antipatterns", which I strongly recommend you read
I am a lazy person, which means that I tend to apply YANGI to my code. So this is my approach.
So. Let's assume that there are two groups of products:
ProductFoo ProductBar
- productID - productID
- name - name
- price - price
- color - supply
- weight - manufacturerID
- safety
In this case there are three common elements, that go in the main Products table. And the custom parameters would be stored using table inheritance (it's a thing, google it). So, basically you would end up with three tables: Products, ProductsFoo and ProductsBar, where Products table has a "type" field and both of the "child tables" would have a productID foreign key, that's pointing to its parent table.
That's if you know at the development time, what "custom fields" each client will want.
Now, lets assume clients are being difficult and want make up custom fields whenever they feel like it.
In this case I would simply create a Products.data fields, which contains a JSON with all the custom attributes for each product. And only "extract" special attributes to an inheriting table, when client wants to search by that custom attribute (there is not sane way to index JSON, if clients want to search by their new "wallywanker" attribute).
You end up with same basic structure, but the "sub-tables" only contain the attributes, that are expected to be searchable
I hope this made sense.
If its is a company project, follow the standards followed on previous projects.
Have a look at conventions such as Hungarian notation, that would make more sense than repeating a prefix. Also it is more likely your model name is your table name.
Also if you are planning to use ORM they might have some best practices as well.
https://en.wikipedia.org/wiki/Hungarian_notation