I am just learning how to implement the Nested Set Model but still have confusion with a certain aspect of it involving items that may be part of multiple categories. Given the example below that was pulled from HERE and mirrors many other examples I have come across...
How do you avoid duplication in the DB when you add Apples since they are multi-colored (i.e. Red, Yellow, Green)?
You do not avoid duplications and the apple (or a reference to the apple) will be placed twice in your tree otherwise it won't be a tree but rather a graph. Your question is equally applicable if you build a... Swing JTree or an HTML tree ;).
The nested set model is just an efficient way to push and traverse a tree structure in a relational DB.It is not a data structure itself. It's more popular among MySQL users since MySQL lacks functionality for processing tree structures (e.g. like the one that Oracle provides).
Cheers!
Nested set model is a structure for 1:N (one-to-many) relationships, you want to use M:N (many to many) relationship (many items can have apple as parent, but can have more than one parent).
See this article
Wikipedia
But you should be aware, that hierarchical M:N relationships can get quite complex really fast!
Thinking out loud here, but perhaps it would be helpful to view some attributes (like Red, Yellow and Green) as 'tags' instead of 'categories' and handle them with separate logic. That would let you keep the Nested Set model and avoid unnecessary duplication. Plus, it would allow you to keep your categories simpler.
It's all in how you think about the information. Categories are just another way of representing attributes. I understand your example was just for illustrative purposes, but if you're going to categorize fruit by color, why would you not also categorize meat the same way, i.e., white meat and red meat? Most likely you would not. So my point is it's probably not necessary to categorize fruit by color, either.
Instead, some attributes are better represented in other ways. In fact, in its simplest form, it could be recorded as a column in the 'food' table labeled 'color'. Or, if it's a very common attribute and you find yourself duplicating the value significantly, it could be split off to a separate table named 'color' and mapped to each food item from a third table. Of course, the more abstract approach would be to generalize the table as 'tags' and include each color as an individual tag that can then be mapped to any food item. Then you can map any number of tags (colors) to any number of food items, giving you a true many-to-many relationship and freeing up your category designations to be more generalized as well.
I know there's ongoing debate about whether tags are categories or categories are tags, etc., but this appears to be one instance in which they could be complimentary and create a more abstract and robust system that's easier to manage.
Old thread, but I found a better answer to this problem.
Since apple can have different color, your structure is a graph,not a tree. The nested set model is not the right structure for that.
Since you mention in a comment that you're using Mysql, a better solution is to use the Open Query Graph engine (http://openquery.com/graph/doc) which is a mysql plugin that lets you create a special table where you put the relationships, basically parentId and childId.
The magic is that you query this table with a special column latch depending of the value passed in the query will tell the OQGRAPH engine which command to execute. See the docs for details.
Related
I do not have much experience in table design. My goal is to create one or more product tables that meet the requirements below:
Support many kinds of products (TV, Phone, PC, ...). Each kind of product has a different set of parameters, like:
Phone will have Color, Size, Weight, OS...
PC will have CPU, HDD, RAM...
The set of parameters must be dynamic. You can add or edit any parameter you like.
How can I meet these requirements without a separate table for each kind of product?
You have at least these five options for modeling the type hierarchy you describe:
Single Table Inheritance: one table for all Product types, with enough columns to store all attributes of all types. This means a lot of columns, most of which are NULL on any given row.
Class Table Inheritance: one table for Products, storing attributes common to all product types. Then one table per product type, storing attributes specific to that product type.
Concrete Table Inheritance: no table for common Products attributes. Instead, one table per product type, storing both common product attributes, and product-specific attributes.
Serialized LOB: One table for Products, storing attributes common to all product types. One extra column stores a BLOB of semi-structured data, in XML, YAML, JSON, or some other format. This BLOB allows you to store the attributes specific to each product type. You can use fancy Design Patterns to describe this, such as Facade and Memento. But regardless you have a blob of attributes that can't be easily queried within SQL; you have to fetch the whole blob back to the application and sort it out there.
Entity-Attribute-Value: One table for Products, and one table that pivots attributes to rows, instead of columns. EAV is not a valid design with respect to the relational paradigm, but many people use it anyway. This is the "Properties Pattern" mentioned by another answer. See other questions with the eav tag on StackOverflow for some of the pitfalls.
I have written more about this in a presentation, Extensible Data Modeling.
Additional thoughts about EAV: Although many people seem to favor EAV, I don't. It seems like the most flexible solution, and therefore the best. However, keep in mind the adage TANSTAAFL. Here are some of the disadvantages of EAV:
No way to make a column mandatory (equivalent of NOT NULL).
No way to use SQL data types to validate entries.
No way to ensure that attribute names are spelled consistently.
No way to put a foreign key on the values of any given attribute, e.g. for a lookup table.
Fetching results in a conventional tabular layout is complex and expensive, because to get attributes from multiple rows you need to do JOIN for each attribute.
The degree of flexibility EAV gives you requires sacrifices in other areas, probably making your code as complex (or worse) than it would have been to solve the original problem in a more conventional way.
And in most cases, it's unnecessary to have that degree of flexibility. In the OP's question about product types, it's much simpler to create a table per product type for product-specific attributes, so you have some consistent structure enforced at least for entries of the same product type.
I'd use EAV only if every row must be permitted to potentially have a distinct set of attributes. When you have a finite set of product types, EAV is overkill. Class Table Inheritance would be my first choice.
Update 2019: The more I see people using JSON as a solution for the "many custom attributes" problem, the less I like that solution. It makes queries too complex, even when using special JSON functions to support them. It takes a lot more storage space to store JSON documents, versus storing in normal rows and columns.
Basically, none of these solutions are easy or efficient in a relational database. The whole idea of having "variable attributes" is fundamentally at odds with relational theory.
What it comes down to is that you have to choose one of the solutions based on which is the least bad for your app. Therefore you need to know how you're going to query the data before you choose a database design. There's no way to choose one solution that is "best" because any of the solutions might be best for a given application.
#StoneHeart
I would go here with EAV and MVC all the way.
#Bill Karvin
Here are some of the disadvantages of
EAV:
No way to make a column mandatory (equivalent of NOT NULL).
No way to use SQL data types to validate entries.
No way to ensure that attribute names are spelled consistently.
No way to put a foreign key on the values of any given attribute, e.g.
for a lookup table.
All those things that you have mentioned here:
data validation
attribute names spelling validation
mandatory columns/fields
handling the destruction of dependent attributes
in my opinion don't belong in a database at all because none of databases are capable of handling those interactions and requirements on a proper level as a programming language of an application does.
In my opinion using a database in this way is like using a rock to hammer a nail. You can do it with a rock but aren't you suppose to use a hammer which is more precise and specifically designed for this sort of activity ?
Fetching results in a conventional tabular layout is complex and
expensive, because to get attributes
from multiple rows you need to do JOIN
for each attribute.
This problem can be solved by making few queries on partial data and processing them into tabular layout with your application. Even if you have 600GB of product data you can process it in batches if you require data from every single row in this table.
Going further If you would like to improve the performance of the queries you can select certain operations like for e.g. reporting or global text search and prepare for them index tables which would store required data and would be regenerated periodically, lets say every 30 minutes.
You don't even need to be concerned with the cost of extra data storage because it gets cheaper and cheaper every day.
If you would still be concerned with performance of operations done by the application, you can always use Erlang, C++, Go Language to pre-process the data and later on just process the optimised data further in your main app.
If I use Class Table Inheritance meaning:
one table for Products, storing attributes common to all product types. Then one table per product type, storing attributes specific to that product type.
-Bill Karwin
Which I like the best of Bill Karwin's Suggestions.. I can kind of foresee one drawback, which I will try to explain how to keep from becoming a problem.
What contingency plan should I have in place when an attribute that is only common to 1 type, then becomes common to 2, then 3, etc?
For example: (this is just an example, not my real issue)
If we sell furniture, we might sell chairs, lamps, sofas, TVs, etc. The TV type might be the only type we carry that has a power consumption. So I would put the power_consumption attribute on the tv_type_table. But then we start to carry Home theater systems which also have a power_consumption property. OK its just one other product so I'll add this field to the stereo_type_table as well since that is probably easiest at this point. But over time as we start to carry more and more electronics, we realize that power_consumption is broad enough that it should be in the main_product_table. What should I do now?
Add the field to the main_product_table. Write a script to loop through the electronics and put the correct value from each type_table to the main_product_table. Then drop that column from each type_table.
Now If I was always using the same GetProductData class to interact with the database to pull the product info; then if any changes in code now need refactoring, they should be to that Class only.
You can have a Product table and a separate ProductAdditionInfo table with 3 columns: product ID, additional info name, additional info value. If color is used by many but not all kinds of Products you could have it be a nullable column in the Product table, or just put it in ProductAdditionalInfo.
This approach is not a traditional technique for a relational database, but I have seen it used a lot in practice. It can be flexible and have good performance.
Steve Yegge calls this the Properties pattern and wrote a long post about using it.
I do not have much experience in table design. My goal is to create one or more product tables that meet the requirements below:
Support many kinds of products (TV, Phone, PC, ...). Each kind of product has a different set of parameters, like:
Phone will have Color, Size, Weight, OS...
PC will have CPU, HDD, RAM...
The set of parameters must be dynamic. You can add or edit any parameter you like.
How can I meet these requirements without a separate table for each kind of product?
You have at least these five options for modeling the type hierarchy you describe:
Single Table Inheritance: one table for all Product types, with enough columns to store all attributes of all types. This means a lot of columns, most of which are NULL on any given row.
Class Table Inheritance: one table for Products, storing attributes common to all product types. Then one table per product type, storing attributes specific to that product type.
Concrete Table Inheritance: no table for common Products attributes. Instead, one table per product type, storing both common product attributes, and product-specific attributes.
Serialized LOB: One table for Products, storing attributes common to all product types. One extra column stores a BLOB of semi-structured data, in XML, YAML, JSON, or some other format. This BLOB allows you to store the attributes specific to each product type. You can use fancy Design Patterns to describe this, such as Facade and Memento. But regardless you have a blob of attributes that can't be easily queried within SQL; you have to fetch the whole blob back to the application and sort it out there.
Entity-Attribute-Value: One table for Products, and one table that pivots attributes to rows, instead of columns. EAV is not a valid design with respect to the relational paradigm, but many people use it anyway. This is the "Properties Pattern" mentioned by another answer. See other questions with the eav tag on StackOverflow for some of the pitfalls.
I have written more about this in a presentation, Extensible Data Modeling.
Additional thoughts about EAV: Although many people seem to favor EAV, I don't. It seems like the most flexible solution, and therefore the best. However, keep in mind the adage TANSTAAFL. Here are some of the disadvantages of EAV:
No way to make a column mandatory (equivalent of NOT NULL).
No way to use SQL data types to validate entries.
No way to ensure that attribute names are spelled consistently.
No way to put a foreign key on the values of any given attribute, e.g. for a lookup table.
Fetching results in a conventional tabular layout is complex and expensive, because to get attributes from multiple rows you need to do JOIN for each attribute.
The degree of flexibility EAV gives you requires sacrifices in other areas, probably making your code as complex (or worse) than it would have been to solve the original problem in a more conventional way.
And in most cases, it's unnecessary to have that degree of flexibility. In the OP's question about product types, it's much simpler to create a table per product type for product-specific attributes, so you have some consistent structure enforced at least for entries of the same product type.
I'd use EAV only if every row must be permitted to potentially have a distinct set of attributes. When you have a finite set of product types, EAV is overkill. Class Table Inheritance would be my first choice.
Update 2019: The more I see people using JSON as a solution for the "many custom attributes" problem, the less I like that solution. It makes queries too complex, even when using special JSON functions to support them. It takes a lot more storage space to store JSON documents, versus storing in normal rows and columns.
Basically, none of these solutions are easy or efficient in a relational database. The whole idea of having "variable attributes" is fundamentally at odds with relational theory.
What it comes down to is that you have to choose one of the solutions based on which is the least bad for your app. Therefore you need to know how you're going to query the data before you choose a database design. There's no way to choose one solution that is "best" because any of the solutions might be best for a given application.
#StoneHeart
I would go here with EAV and MVC all the way.
#Bill Karvin
Here are some of the disadvantages of
EAV:
No way to make a column mandatory (equivalent of NOT NULL).
No way to use SQL data types to validate entries.
No way to ensure that attribute names are spelled consistently.
No way to put a foreign key on the values of any given attribute, e.g.
for a lookup table.
All those things that you have mentioned here:
data validation
attribute names spelling validation
mandatory columns/fields
handling the destruction of dependent attributes
in my opinion don't belong in a database at all because none of databases are capable of handling those interactions and requirements on a proper level as a programming language of an application does.
In my opinion using a database in this way is like using a rock to hammer a nail. You can do it with a rock but aren't you suppose to use a hammer which is more precise and specifically designed for this sort of activity ?
Fetching results in a conventional tabular layout is complex and
expensive, because to get attributes
from multiple rows you need to do JOIN
for each attribute.
This problem can be solved by making few queries on partial data and processing them into tabular layout with your application. Even if you have 600GB of product data you can process it in batches if you require data from every single row in this table.
Going further If you would like to improve the performance of the queries you can select certain operations like for e.g. reporting or global text search and prepare for them index tables which would store required data and would be regenerated periodically, lets say every 30 minutes.
You don't even need to be concerned with the cost of extra data storage because it gets cheaper and cheaper every day.
If you would still be concerned with performance of operations done by the application, you can always use Erlang, C++, Go Language to pre-process the data and later on just process the optimised data further in your main app.
If I use Class Table Inheritance meaning:
one table for Products, storing attributes common to all product types. Then one table per product type, storing attributes specific to that product type.
-Bill Karwin
Which I like the best of Bill Karwin's Suggestions.. I can kind of foresee one drawback, which I will try to explain how to keep from becoming a problem.
What contingency plan should I have in place when an attribute that is only common to 1 type, then becomes common to 2, then 3, etc?
For example: (this is just an example, not my real issue)
If we sell furniture, we might sell chairs, lamps, sofas, TVs, etc. The TV type might be the only type we carry that has a power consumption. So I would put the power_consumption attribute on the tv_type_table. But then we start to carry Home theater systems which also have a power_consumption property. OK its just one other product so I'll add this field to the stereo_type_table as well since that is probably easiest at this point. But over time as we start to carry more and more electronics, we realize that power_consumption is broad enough that it should be in the main_product_table. What should I do now?
Add the field to the main_product_table. Write a script to loop through the electronics and put the correct value from each type_table to the main_product_table. Then drop that column from each type_table.
Now If I was always using the same GetProductData class to interact with the database to pull the product info; then if any changes in code now need refactoring, they should be to that Class only.
You can have a Product table and a separate ProductAdditionInfo table with 3 columns: product ID, additional info name, additional info value. If color is used by many but not all kinds of Products you could have it be a nullable column in the Product table, or just put it in ProductAdditionalInfo.
This approach is not a traditional technique for a relational database, but I have seen it used a lot in practice. It can be flexible and have good performance.
Steve Yegge calls this the Properties pattern and wrote a long post about using it.
I do not have much experience in table design. My goal is to create one or more product tables that meet the requirements below:
Support many kinds of products (TV, Phone, PC, ...). Each kind of product has a different set of parameters, like:
Phone will have Color, Size, Weight, OS...
PC will have CPU, HDD, RAM...
The set of parameters must be dynamic. You can add or edit any parameter you like.
How can I meet these requirements without a separate table for each kind of product?
You have at least these five options for modeling the type hierarchy you describe:
Single Table Inheritance: one table for all Product types, with enough columns to store all attributes of all types. This means a lot of columns, most of which are NULL on any given row.
Class Table Inheritance: one table for Products, storing attributes common to all product types. Then one table per product type, storing attributes specific to that product type.
Concrete Table Inheritance: no table for common Products attributes. Instead, one table per product type, storing both common product attributes, and product-specific attributes.
Serialized LOB: One table for Products, storing attributes common to all product types. One extra column stores a BLOB of semi-structured data, in XML, YAML, JSON, or some other format. This BLOB allows you to store the attributes specific to each product type. You can use fancy Design Patterns to describe this, such as Facade and Memento. But regardless you have a blob of attributes that can't be easily queried within SQL; you have to fetch the whole blob back to the application and sort it out there.
Entity-Attribute-Value: One table for Products, and one table that pivots attributes to rows, instead of columns. EAV is not a valid design with respect to the relational paradigm, but many people use it anyway. This is the "Properties Pattern" mentioned by another answer. See other questions with the eav tag on StackOverflow for some of the pitfalls.
I have written more about this in a presentation, Extensible Data Modeling.
Additional thoughts about EAV: Although many people seem to favor EAV, I don't. It seems like the most flexible solution, and therefore the best. However, keep in mind the adage TANSTAAFL. Here are some of the disadvantages of EAV:
No way to make a column mandatory (equivalent of NOT NULL).
No way to use SQL data types to validate entries.
No way to ensure that attribute names are spelled consistently.
No way to put a foreign key on the values of any given attribute, e.g. for a lookup table.
Fetching results in a conventional tabular layout is complex and expensive, because to get attributes from multiple rows you need to do JOIN for each attribute.
The degree of flexibility EAV gives you requires sacrifices in other areas, probably making your code as complex (or worse) than it would have been to solve the original problem in a more conventional way.
And in most cases, it's unnecessary to have that degree of flexibility. In the OP's question about product types, it's much simpler to create a table per product type for product-specific attributes, so you have some consistent structure enforced at least for entries of the same product type.
I'd use EAV only if every row must be permitted to potentially have a distinct set of attributes. When you have a finite set of product types, EAV is overkill. Class Table Inheritance would be my first choice.
Update 2019: The more I see people using JSON as a solution for the "many custom attributes" problem, the less I like that solution. It makes queries too complex, even when using special JSON functions to support them. It takes a lot more storage space to store JSON documents, versus storing in normal rows and columns.
Basically, none of these solutions are easy or efficient in a relational database. The whole idea of having "variable attributes" is fundamentally at odds with relational theory.
What it comes down to is that you have to choose one of the solutions based on which is the least bad for your app. Therefore you need to know how you're going to query the data before you choose a database design. There's no way to choose one solution that is "best" because any of the solutions might be best for a given application.
#StoneHeart
I would go here with EAV and MVC all the way.
#Bill Karvin
Here are some of the disadvantages of
EAV:
No way to make a column mandatory (equivalent of NOT NULL).
No way to use SQL data types to validate entries.
No way to ensure that attribute names are spelled consistently.
No way to put a foreign key on the values of any given attribute, e.g.
for a lookup table.
All those things that you have mentioned here:
data validation
attribute names spelling validation
mandatory columns/fields
handling the destruction of dependent attributes
in my opinion don't belong in a database at all because none of databases are capable of handling those interactions and requirements on a proper level as a programming language of an application does.
In my opinion using a database in this way is like using a rock to hammer a nail. You can do it with a rock but aren't you suppose to use a hammer which is more precise and specifically designed for this sort of activity ?
Fetching results in a conventional tabular layout is complex and
expensive, because to get attributes
from multiple rows you need to do JOIN
for each attribute.
This problem can be solved by making few queries on partial data and processing them into tabular layout with your application. Even if you have 600GB of product data you can process it in batches if you require data from every single row in this table.
Going further If you would like to improve the performance of the queries you can select certain operations like for e.g. reporting or global text search and prepare for them index tables which would store required data and would be regenerated periodically, lets say every 30 minutes.
You don't even need to be concerned with the cost of extra data storage because it gets cheaper and cheaper every day.
If you would still be concerned with performance of operations done by the application, you can always use Erlang, C++, Go Language to pre-process the data and later on just process the optimised data further in your main app.
If I use Class Table Inheritance meaning:
one table for Products, storing attributes common to all product types. Then one table per product type, storing attributes specific to that product type.
-Bill Karwin
Which I like the best of Bill Karwin's Suggestions.. I can kind of foresee one drawback, which I will try to explain how to keep from becoming a problem.
What contingency plan should I have in place when an attribute that is only common to 1 type, then becomes common to 2, then 3, etc?
For example: (this is just an example, not my real issue)
If we sell furniture, we might sell chairs, lamps, sofas, TVs, etc. The TV type might be the only type we carry that has a power consumption. So I would put the power_consumption attribute on the tv_type_table. But then we start to carry Home theater systems which also have a power_consumption property. OK its just one other product so I'll add this field to the stereo_type_table as well since that is probably easiest at this point. But over time as we start to carry more and more electronics, we realize that power_consumption is broad enough that it should be in the main_product_table. What should I do now?
Add the field to the main_product_table. Write a script to loop through the electronics and put the correct value from each type_table to the main_product_table. Then drop that column from each type_table.
Now If I was always using the same GetProductData class to interact with the database to pull the product info; then if any changes in code now need refactoring, they should be to that Class only.
You can have a Product table and a separate ProductAdditionInfo table with 3 columns: product ID, additional info name, additional info value. If color is used by many but not all kinds of Products you could have it be a nullable column in the Product table, or just put it in ProductAdditionalInfo.
This approach is not a traditional technique for a relational database, but I have seen it used a lot in practice. It can be flexible and have good performance.
Steve Yegge calls this the Properties pattern and wrote a long post about using it.
I do not have much experience in table design. My goal is to create one or more product tables that meet the requirements below:
Support many kinds of products (TV, Phone, PC, ...). Each kind of product has a different set of parameters, like:
Phone will have Color, Size, Weight, OS...
PC will have CPU, HDD, RAM...
The set of parameters must be dynamic. You can add or edit any parameter you like.
How can I meet these requirements without a separate table for each kind of product?
You have at least these five options for modeling the type hierarchy you describe:
Single Table Inheritance: one table for all Product types, with enough columns to store all attributes of all types. This means a lot of columns, most of which are NULL on any given row.
Class Table Inheritance: one table for Products, storing attributes common to all product types. Then one table per product type, storing attributes specific to that product type.
Concrete Table Inheritance: no table for common Products attributes. Instead, one table per product type, storing both common product attributes, and product-specific attributes.
Serialized LOB: One table for Products, storing attributes common to all product types. One extra column stores a BLOB of semi-structured data, in XML, YAML, JSON, or some other format. This BLOB allows you to store the attributes specific to each product type. You can use fancy Design Patterns to describe this, such as Facade and Memento. But regardless you have a blob of attributes that can't be easily queried within SQL; you have to fetch the whole blob back to the application and sort it out there.
Entity-Attribute-Value: One table for Products, and one table that pivots attributes to rows, instead of columns. EAV is not a valid design with respect to the relational paradigm, but many people use it anyway. This is the "Properties Pattern" mentioned by another answer. See other questions with the eav tag on StackOverflow for some of the pitfalls.
I have written more about this in a presentation, Extensible Data Modeling.
Additional thoughts about EAV: Although many people seem to favor EAV, I don't. It seems like the most flexible solution, and therefore the best. However, keep in mind the adage TANSTAAFL. Here are some of the disadvantages of EAV:
No way to make a column mandatory (equivalent of NOT NULL).
No way to use SQL data types to validate entries.
No way to ensure that attribute names are spelled consistently.
No way to put a foreign key on the values of any given attribute, e.g. for a lookup table.
Fetching results in a conventional tabular layout is complex and expensive, because to get attributes from multiple rows you need to do JOIN for each attribute.
The degree of flexibility EAV gives you requires sacrifices in other areas, probably making your code as complex (or worse) than it would have been to solve the original problem in a more conventional way.
And in most cases, it's unnecessary to have that degree of flexibility. In the OP's question about product types, it's much simpler to create a table per product type for product-specific attributes, so you have some consistent structure enforced at least for entries of the same product type.
I'd use EAV only if every row must be permitted to potentially have a distinct set of attributes. When you have a finite set of product types, EAV is overkill. Class Table Inheritance would be my first choice.
Update 2019: The more I see people using JSON as a solution for the "many custom attributes" problem, the less I like that solution. It makes queries too complex, even when using special JSON functions to support them. It takes a lot more storage space to store JSON documents, versus storing in normal rows and columns.
Basically, none of these solutions are easy or efficient in a relational database. The whole idea of having "variable attributes" is fundamentally at odds with relational theory.
What it comes down to is that you have to choose one of the solutions based on which is the least bad for your app. Therefore you need to know how you're going to query the data before you choose a database design. There's no way to choose one solution that is "best" because any of the solutions might be best for a given application.
#StoneHeart
I would go here with EAV and MVC all the way.
#Bill Karvin
Here are some of the disadvantages of
EAV:
No way to make a column mandatory (equivalent of NOT NULL).
No way to use SQL data types to validate entries.
No way to ensure that attribute names are spelled consistently.
No way to put a foreign key on the values of any given attribute, e.g.
for a lookup table.
All those things that you have mentioned here:
data validation
attribute names spelling validation
mandatory columns/fields
handling the destruction of dependent attributes
in my opinion don't belong in a database at all because none of databases are capable of handling those interactions and requirements on a proper level as a programming language of an application does.
In my opinion using a database in this way is like using a rock to hammer a nail. You can do it with a rock but aren't you suppose to use a hammer which is more precise and specifically designed for this sort of activity ?
Fetching results in a conventional tabular layout is complex and
expensive, because to get attributes
from multiple rows you need to do JOIN
for each attribute.
This problem can be solved by making few queries on partial data and processing them into tabular layout with your application. Even if you have 600GB of product data you can process it in batches if you require data from every single row in this table.
Going further If you would like to improve the performance of the queries you can select certain operations like for e.g. reporting or global text search and prepare for them index tables which would store required data and would be regenerated periodically, lets say every 30 minutes.
You don't even need to be concerned with the cost of extra data storage because it gets cheaper and cheaper every day.
If you would still be concerned with performance of operations done by the application, you can always use Erlang, C++, Go Language to pre-process the data and later on just process the optimised data further in your main app.
If I use Class Table Inheritance meaning:
one table for Products, storing attributes common to all product types. Then one table per product type, storing attributes specific to that product type.
-Bill Karwin
Which I like the best of Bill Karwin's Suggestions.. I can kind of foresee one drawback, which I will try to explain how to keep from becoming a problem.
What contingency plan should I have in place when an attribute that is only common to 1 type, then becomes common to 2, then 3, etc?
For example: (this is just an example, not my real issue)
If we sell furniture, we might sell chairs, lamps, sofas, TVs, etc. The TV type might be the only type we carry that has a power consumption. So I would put the power_consumption attribute on the tv_type_table. But then we start to carry Home theater systems which also have a power_consumption property. OK its just one other product so I'll add this field to the stereo_type_table as well since that is probably easiest at this point. But over time as we start to carry more and more electronics, we realize that power_consumption is broad enough that it should be in the main_product_table. What should I do now?
Add the field to the main_product_table. Write a script to loop through the electronics and put the correct value from each type_table to the main_product_table. Then drop that column from each type_table.
Now If I was always using the same GetProductData class to interact with the database to pull the product info; then if any changes in code now need refactoring, they should be to that Class only.
You can have a Product table and a separate ProductAdditionInfo table with 3 columns: product ID, additional info name, additional info value. If color is used by many but not all kinds of Products you could have it be a nullable column in the Product table, or just put it in ProductAdditionalInfo.
This approach is not a traditional technique for a relational database, but I have seen it used a lot in practice. It can be flexible and have good performance.
Steve Yegge calls this the Properties pattern and wrote a long post about using it.
I have a number of distinct items stored in different MySQL tables, which I'd like to put in a tree hierarchy. Using the adjacency list model, I can add a parent_id field to each table and link the tables using a foreign key relationship.
However, I'd like to use a nested sets/modified preorder tree traversal model. The data will be used in a environment that's heavily biased towards reads, and the kind of queries I expect to run favour this approach.
The problem is that all the information I have on nested sets assumes that you only have one type of item, stored in a single table. The ways round this that I can think of are:
Having multiple foreign key fields in the tree, one for each table/item type.
Storing the name of the item table in the tree structure as well as the item ID.
Both approaches are inelegant to say the least, so is there a better way of doing this?
RDBMS are not a good match to storing hierarchies to begin with, and your use case makes this even worse. I think a little more fine tuned but still ugly variations of your own suggestions are what you are going to get using a RDBMS. IMHO other data models would provide better solutions to your problem, like graph databases or maybe document databases. The article Should you go Beyond Relational Databases? gives a nice introduction to this kind of stuff.
You have have several types of tree, and a single table which contains the tree information (i.e. the left/right values) for all tree types?
If you have have several types of tree, why not a different table for each type?