modeling many to many unary relationship and 1:M unary relationship - mysql

Im getting back into database design and i realize that I have huge gaps in my knowledge.
I have a table that contains categories. Each category can have many subcategories and each subcategory can belong to many super-categories.
I want to create a folder with a category name which will contain all the subcategories folders. (visual object like windows folders)
So i need to preform quick searches of the subcategories.
I wonder what are the benefits of using 1:M or M:N relationship in this case?
And how to implement each design?
I have create a ERD model which is a 1:M unary relationship. (the diagram also contains an expense table which stores all the expense values but is irrelevant in this case)
is this design correct?
will many to many unary relationship allow for faster searches of super-categories and is the best design by default?
I would prefer an answer which contains an ERD

If I understand you correctly, a single sub-category can have at most one (direct) super-category, in which case you don't need a separate table. Something like this should be enough:
Obviously, you'd need a recursive query to get the sub-categories from all levels, but it should be fairly efficient provided you put an index on PARENT_ID.
Going in the opposite direction (and getting all ancestors) would also require a recursive query. Since this would entail searching on PK (which is automatically indexed), this should be reasonably efficient as well.
For some more ideas and different performance tradeoffs, take a look at this slide-show.

In some cases the easiest way to maintain a multilevel hierarchy in a relational database is the Nested Set Model, sometimes also called "modified preorder tree traversal" (MPTT).
Basically the tree nodes store not only the parent id but also the ids of the left-most and right-most leaf:
spending_category
-----------------
parent_id int
left_id int
right_id int
name char
The major benefit from doing this is that now you are able to get an entire subtree of a node with a single query: the ids of subtree nodes are between left_id and right_id. There are many variations; others store the depth of the node in addition to or instead of the parent node id.
A drawback is that left_id and right_id have to be updated when nodes are inserted or deleted, which means this approach is useful only for trees of moderate size.
The wikipedia article and the slideshow mentioned by Branko explains the technique better than I can. Also check out this list of resources if you want to know more about different ways of storing hierarchical data in a relational database.

Related

Best method for storing hierarchy of organisations using eloquent

I need to store organisation ownership hierarchy in a laravel backend. Each node in the hierarchy can be one of a number of types, and each relationship needs to carry the amount of ownership (and potentially more meta data relating to the relationship between nodes). The structure can be arbitrarily deep, and it must be possible to attach a subtree an arbitrary number of times (see C1 below, which appears twice). Below is a sketch of kind of hierarchy I need....
I am using mySQL 8 so I have access to CTE for recursion. I have looked into the adjacency-list package (staudenmeir/laravel-adjacency-list) which uses CTE and looks good, but it uses self referencing tables. I think this means that I cannot store relationship data, and the I don't think I can get the repeated sub tree structure you see above.
I am currently exploring many to many relationships, with a custom pivot table to store the "relationship weighting". But I am unsure if this is a sensible approach and perhaps I'm missing some useful design pattern or this.
I am aware that this is a nebulous question, but while I'm trying to crack this myself using eloquent relationships, I thought I might get a discussion going about design pattens for this type of work.

Hierarchical Data - Nested Set Model: MySql

I am just learning how to implement the Nested Set Model but still have confusion with a certain aspect of it involving items that may be part of multiple categories. Given the example below that was pulled from HERE and mirrors many other examples I have come across...
How do you avoid duplication in the DB when you add Apples since they are multi-colored (i.e. Red, Yellow, Green)?
You do not avoid duplications and the apple (or a reference to the apple) will be placed twice in your tree otherwise it won't be a tree but rather a graph. Your question is equally applicable if you build a... Swing JTree or an HTML tree ;).
The nested set model is just an efficient way to push and traverse a tree structure in a relational DB.It is not a data structure itself. It's more popular among MySQL users since MySQL lacks functionality for processing tree structures (e.g. like the one that Oracle provides).
Cheers!
Nested set model is a structure for 1:N (one-to-many) relationships, you want to use M:N (many to many) relationship (many items can have apple as parent, but can have more than one parent).
See this article
Wikipedia
But you should be aware, that hierarchical M:N relationships can get quite complex really fast!
Thinking out loud here, but perhaps it would be helpful to view some attributes (like Red, Yellow and Green) as 'tags' instead of 'categories' and handle them with separate logic. That would let you keep the Nested Set model and avoid unnecessary duplication. Plus, it would allow you to keep your categories simpler.
It's all in how you think about the information. Categories are just another way of representing attributes. I understand your example was just for illustrative purposes, but if you're going to categorize fruit by color, why would you not also categorize meat the same way, i.e., white meat and red meat? Most likely you would not. So my point is it's probably not necessary to categorize fruit by color, either.
Instead, some attributes are better represented in other ways. In fact, in its simplest form, it could be recorded as a column in the 'food' table labeled 'color'. Or, if it's a very common attribute and you find yourself duplicating the value significantly, it could be split off to a separate table named 'color' and mapped to each food item from a third table. Of course, the more abstract approach would be to generalize the table as 'tags' and include each color as an individual tag that can then be mapped to any food item. Then you can map any number of tags (colors) to any number of food items, giving you a true many-to-many relationship and freeing up your category designations to be more generalized as well.
I know there's ongoing debate about whether tags are categories or categories are tags, etc., but this appears to be one instance in which they could be complimentary and create a more abstract and robust system that's easier to manage.
Old thread, but I found a better answer to this problem.
Since apple can have different color, your structure is a graph,not a tree. The nested set model is not the right structure for that.
Since you mention in a comment that you're using Mysql, a better solution is to use the Open Query Graph engine (http://openquery.com/graph/doc) which is a mysql plugin that lets you create a special table where you put the relationships, basically parentId and childId.
The magic is that you query this table with a special column latch depending of the value passed in the query will tell the OQGRAPH engine which command to execute. See the docs for details.

Database Modeling: How to catogorize products like Amazon?

Assume I had a number of products (from a few thousands to hundred of thousands) that needed to be categorized in a hierarchical manner. How would I model such a solution in a database?
Would a simple parent-child table like this work:
product_category
- id
- parent_id
- category_name
Then in my products table, I would just do this:
product
- id
- product_category_id
- name
- description
- price
My concern is that this won't scale. By the way, I'm using MySQL for now.
Course it will scale. That will work just fine, it is a commonly used structure.
Include a level_no. That will assist in the code, but more important, it is required to exclude duplicates.
If you want a really tight structure, you need something like the Unix concept of inodes.
You may have difficulty getting your head around the code required to produce the hierarchy, say from a product, but that is a separate issue.
And please change
(product_category)) id to product_category_id
(product id to product_id
parent_id to parent_product_category_id
Responses to Comments
level_no. Have a look at this Data Model, it is for a Directory Tree structure (eg. the FlieManager Explorer window):
Directory Data Model
See if you can make sense of it, that's the Unix inode concept. The FileNames have to be unique within the Node, hence the second Index. That is actually complete, but some developers these days will have a hissy fit writing the code required to navigate the hierarchy, the levels. Those developers need a level_no to identify what level in the hierarchy they are dealing with.
Recommended changes. Yes, it is called Good Naming Conventions. I am rigid about it, and I publish it, so it is a Naming Standard. There are reasons for it, which will become clear to you when you write some SQL with 3 or 4 levels of joins; especially when you go to same one parent two different ways. If you search SO, you will find many questions for this; always the same answer. It will also be highlit in the next model I write for you.
I used to struggle with the same problem 10 years ago. Here's my personal solution to this problem. But before I start explaining, I would like to mention its pros and cons.
Pros:
You can select subbranches of a given node within any number of
desired depths, with the lowest imaginable cost.
The same can be done to select parent nodes.
No RDBMS specific feature is needed. So the same technique can be
implemented in any of them.
It is all implemented using a single field.
Cons:
You should be able to define a maximum number of depth for your
tree. You also need to define the maximum number of direct children
for the nodes.
Restructuring the tree is more expensive than traversing it. But not
as expensive as Nest Set Model. Adding a new branch is the matter of
finding the right value for the field. And in order to move a branch
into a new parent you need to update that node and all its children
(direct and indirect). The good news is that deleting a node and its
children is as easy as traversing it (which is absolutely nothing).
The technique:
Consider the following table as your tree holder:
CREATE TABLE IF NOT EXISTS `product_category` (
`product_category_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(20) NOT NULL,
`category_code` varchar(62) NOT NULL,
PRIMARY KEY (`product_category_id`),
UNIQUE KEY `uni_category_code` (`category_code`)
) DEFAULT CHARSET=utf8 ;
All the magic is done in category_code field. You need to encode your branch address into a text value as follow:
**node_name -> category_code**
Root -> 01
First child -> 01:01
Second child -> 01:02
First grandchild -> 01:01:01
First child of second child -> 01:02:01
In the above example, each node can have up to 99 direct children (assuming we are thinking in decimal). And since category_code is of type varchar(62), we can have up to (62-2)/3 = 20 depth. It's a trade off between the depth you want and the number of direct children each node can have and the size of your field. Scientifically speaking, this is an implementation of a complete tree in which unused branches are not actually created but reserved.
The good parts:
Now imagine you want to select nodes under 01:02. You can do this using a single query:
SELECT *
FROM product_category
WHERE
category_code LIKE '01:02:%'
Selecting direct nodes under the 01:02:
SELECT *
FROM product_category
WHERE
category_code LIKE '01:02:__'
Selecting all the ancestors of 01:02:
SELECT *
FROM product_category
WHERE
'01:02' LIKE CONCAT(category_code, ':%')
The bad parts:
Inserting a new node into the tree is the matter of finding the right category_code. This can be done using a stored procedure or even in a programming language like PHP.
Since the tree is limited in the number of direct children and depth, an insert can fail. But I believe in most practical cases we can assume such a limitation.
Cheers.
Your solution uses the adjacency list model of a hierarchy. It's by far the most common. It will scale ok up to thousands of products. The problem is that it takes either a recursive query or product specific extensions to SQL to deal with an indefinitely deep hierarchy.
There are other models of a hierarchy. In particular, there's the nested set model. The nested set model is good for retrieving the path of any node in a single query. It's also good for retrieving any desired sub tree. It's more work to keep it up to date. A lot more work.
You may want to briefly explore it before you bite off more than you want to chew.
What are you going to do with the hierarchy?
I think your big issue is that this is a deficiency in MySQL. For most RDBMSs which support WITH and WITH RECURSIVE, you should require only one scan per level. This makes deep hierarchies a bit problematic but usually not too bad.
I think to make this work well you will have to code a fairly extensive stored procedure, or you will have to go to another tree model, or you will have to move to a different RDBMS. For example this is easy to do with PostgreSQL and WITH RECURSIVE and this offers a lot better scalability than many other approaches.

Can a binary tree or tree be always represented in a Database as 1 table and self-referencing?

I didn't feel this rule before, but it seems that a binary tree or any tree (each node can have many children but children cannot point back to any parent), then this data structure can be represented as 1 table in a database, with each row having an ID for itself and a parentID that points back to the parent node.
That is in fact the classical Employee - Manager diagram: one boss can have many people under him... and each person can have n people under him, etc. This is a tree structure and is represented in database books as a common example as a single table Employee.
The answer to your question is 'yes'.
Simon's warning about your trees becoming a cyclic graph is correct too.
All the stuff that has been said about "You have to ensure by hand that this won't happen, i.e. the DBMS won't do that for you automatically, because you will not break any integrity or reference rules.", is WRONG.
This remark and the coresponding comments holds true, as long as you only consider SQL systems.
There exist systems which CAN do this for you in a pure declarative way, that is without you having to write *any* code whatsoever. That system is SIRA_PRISE (http://shark.armchair.mb.ca/~erwin).
Yes, you can represent hierarchical structures by self-referencing the table. Just be aware of such situations:
Employee Supervisor
1 2
2 1
Yes, that is correct. Here's a good reference
Just be aware that you generally need a loop in order to unroll the tree (e.g. find transitive relationships)

Multiple tables in nested sets hierarchy

I have a number of distinct items stored in different MySQL tables, which I'd like to put in a tree hierarchy. Using the adjacency list model, I can add a parent_id field to each table and link the tables using a foreign key relationship.
However, I'd like to use a nested sets/modified preorder tree traversal model. The data will be used in a environment that's heavily biased towards reads, and the kind of queries I expect to run favour this approach.
The problem is that all the information I have on nested sets assumes that you only have one type of item, stored in a single table. The ways round this that I can think of are:
Having multiple foreign key fields in the tree, one for each table/item type.
Storing the name of the item table in the tree structure as well as the item ID.
Both approaches are inelegant to say the least, so is there a better way of doing this?
RDBMS are not a good match to storing hierarchies to begin with, and your use case makes this even worse. I think a little more fine tuned but still ugly variations of your own suggestions are what you are going to get using a RDBMS. IMHO other data models would provide better solutions to your problem, like graph databases or maybe document databases. The article Should you go Beyond Relational Databases? gives a nice introduction to this kind of stuff.
You have have several types of tree, and a single table which contains the tree information (i.e. the left/right values) for all tree types?
If you have have several types of tree, why not a different table for each type?