Depth of Nested Set - mysql

I have a large mysql table with parent-child relationships stored in the nested set model (Left and right values).
It makes it EASY to find all the children of a given item.
Now, how do I find the DEPTH of a certain item.
Example of the row:
Parent_ID, Taxon_ID, Taxon_Name, lft, rgt
for somerow(taxon_id) I want to know how far it is from the root node.
NOW it may be important here to note that in the way I have the data structured is that each terminal node (a node with no children of its own) lft = rgt. I know many of the examples posted online have rgt = lft +1, but we decided not to do that just for sake of ease.
Summary:
Nested set model, need to find the depth (number of nodes to get to the root) of a given node.

I figured it out.
Essentially you have to query all the nodes that contain the node you are looking for inside. So for example, I was looking at one node that has lft=rgt=7330 and I wanted the depth of it. I just needed to
Select count(*)
from table
where lft<7330
AND rgt>7330
You may want to add 1 to the result before you use it because its really telling you the number of generations preceding rather than the actual level. But it works and its fast!

MySQL does not support recursive queries. I believe PostgreSQL offers limited support, but it would be inefficient and messy. There is no reason, however, why you couldn't execute queries in a recursive fashion (i.e., programatically) to arrive at the desired result.
If "how deep is this node?" is a question you need answered often, you might consider adjusting the schema of your table so that each node stores and maintains its depth. Then you can just read that value instead of calculating it with awkward recursion. (Maintenance of the depth values may become tedious if you're shuffling the table, but assuming you are doing far fewer writes than reads, this is a more efficient approach.)

Related

Insert/update efficient solution to store hierarchical data in MySQL?

I understand there are several patterns for storing hierarchical data in a relational database such as using adjacency lists, nested sets, etc.
However, the drawback with something like a nested set is that if you frequently have to update nodes by adding/removing children, there's a high cost to then update the rest of the table.
What's a solution for a scenario such as the following example:
(Parent1)
/ | \
(Child1) (Child2) (Child3)
/ |
[Child1a, Child1b][Child2a]
where it will be a frequent requirement to update to:
(Parent1)
/ | \
(Child1) (Child4) (Child5)
/ | \
[Child1a, Child1b][Child4a] [Child5a]
etc.
My data will be nested at most 3 levels deep, but the idea is that the solution should support many of this little trees stored in the table, and children can be updated/modified in a performant manner.
The least expensive method of storing hierarchical data in terms of storage and complexity of updates is Adjacency List.
Defining a child's parent updates exactly 1 row
Moving a child to a new parent updates exactly 1 row
Removing a subtree of N nodes is a deletion of N rows
Adding a subtree of N nodes is an insertion of N rows
The other techniques like Nested Sets or Path Enumeration or Closure Table require more complex updates, but the tradeoff is that those techniques support arbitrary-depth operations without needing recursive query syntax.
If you can guarantee that the tree is never deeper than three levels, you can do many operations with Adjacency List with a couple of simple outer joins.
Note that MySQL 8.0 is implementing recursive query syntax, so the workaround techniques may become less necessary in the future.

Storing Unbalanced Tree in Database

I'm working on a project where I need to store a Tree structure in a database, in the past I already dealt with the same scenario and I used a particular solution (explained below).
I know that there is no BEST solution, and usually the best solution is the one who gives the major advantages, but there is undoubtedly a worst one, and I'd like to not use that...
As I was saying I need to:
store an unbalanced tree structure
any node can have an "unlimited" number of children
have the ability to easily obtain all the children (recursively) of one node
have the ability to easily "rebuild" the tree structure
The solution I used in the past consisted into use a VARCHAR(X * Y) primary key where:
X is the "hypothetical" maximum level possible
Y is the character number of the "hypothetical" maximum number of direct children of one node...
i.e.
If I have:
- at maximum 3 levels then X = 3
- at maximum 20 direct children per node, Y = 2 (20 has two characters - it is possible then to store up to 99 children)
The PRIMARY KEY column will be created as VARCHAR(6)
The ID is a composite combination of PARENT ID + NODE_ID
NODE ID is an incremental numerical value padded with zeros on the left side.
the node in the first level will be then stored as:
[01,02,03,04,...,99]
the nodes in the second level will be stored as:
[0101, 0102, 0103, ..., 0201, 0202, 0203, ... , 9901, 9999]
the nodes in the third level will be stored as:
[010101, 010102, 010103, ..., 020101, 020102, 020301, ... , 990101, 999999]
and so on...
PROs:
It's easy to rebuild the tree
It's super easy obtain the children list of one particular node (ie. select ... where id like '0101%')
Only one column for both the identifier and the parent link.
CONs:
It's mandatory to define a MAX number of CHILDREN/LEVELS
It the X and Y values are great the id key will be a way too long
VARCHAR type as primary key
Changing the tree structure (move one node from one parent to another) will be difficult (if not impossible) and consuming because of the necessity to re-create the entire ids for the node and all it's children.
Preorder Tree Traversal
I did some research and the best solution I found to my main problems (obtaining all the children of one node, etc.), is to use the Preorder Tree Traversal solution
(for the sake of brevity I will post a link where the solution is explained: HERE )
Whilst this solution is better in almost every aspects, it has a HUGE downside, any change in the structure (add/remove/change parent of a node) needs to RECREATE the entire left/right indexes, and this operation is time and resource consuming.
Conclusion
Having said so, any suggestion is very much appreciated.
Which is for you the best solution to maximize the needs explained in the beginning?

Count number of children in limited depth tree

I have a tree structure in MySQL using a parent_id field for each node. The structure is quite large but only about 8 levels deep. Each node also has a child_counter field.
I'm looking for a performant way to calculate (and update) the number of children each node has (by children I am including children of children etc.), preferably with not too many SQL calls. I don't expect this to be super fast, just good enough.
I'm hoping there might be some way to do a mass update as the algorithm iterates.
I've implemented something like this before, but not quite. Are you able to store a "depth" integer on each node, to mark their depth in the tree? If so, you could do this with 8 queries, which if properly indexed would be quite reasonable. Something like this I think (don't have mysql handy). You would query from the lowest depth (i.e. 8) upwards:
UPDATE Node SET child_counter = ((SELECT SUM(child_counter) FROM Node WHERE parent_id = id) + (SELECT COUNT(0) FROM Node WHERE parent_id = id)) WHERE depth = x
Here is a good article: Managing Hierarchical Data in MySQL
You're currently using an Adjacency List Model with additional count column. But actually what you try to get from hierarchical data looks like it could be modeled much easier with the Nested Set Model.

Database Modeling: How to catogorize products like Amazon?

Assume I had a number of products (from a few thousands to hundred of thousands) that needed to be categorized in a hierarchical manner. How would I model such a solution in a database?
Would a simple parent-child table like this work:
product_category
- id
- parent_id
- category_name
Then in my products table, I would just do this:
product
- id
- product_category_id
- name
- description
- price
My concern is that this won't scale. By the way, I'm using MySQL for now.
Course it will scale. That will work just fine, it is a commonly used structure.
Include a level_no. That will assist in the code, but more important, it is required to exclude duplicates.
If you want a really tight structure, you need something like the Unix concept of inodes.
You may have difficulty getting your head around the code required to produce the hierarchy, say from a product, but that is a separate issue.
And please change
(product_category)) id to product_category_id
(product id to product_id
parent_id to parent_product_category_id
Responses to Comments
level_no. Have a look at this Data Model, it is for a Directory Tree structure (eg. the FlieManager Explorer window):
Directory Data Model
See if you can make sense of it, that's the Unix inode concept. The FileNames have to be unique within the Node, hence the second Index. That is actually complete, but some developers these days will have a hissy fit writing the code required to navigate the hierarchy, the levels. Those developers need a level_no to identify what level in the hierarchy they are dealing with.
Recommended changes. Yes, it is called Good Naming Conventions. I am rigid about it, and I publish it, so it is a Naming Standard. There are reasons for it, which will become clear to you when you write some SQL with 3 or 4 levels of joins; especially when you go to same one parent two different ways. If you search SO, you will find many questions for this; always the same answer. It will also be highlit in the next model I write for you.
I used to struggle with the same problem 10 years ago. Here's my personal solution to this problem. But before I start explaining, I would like to mention its pros and cons.
Pros:
You can select subbranches of a given node within any number of
desired depths, with the lowest imaginable cost.
The same can be done to select parent nodes.
No RDBMS specific feature is needed. So the same technique can be
implemented in any of them.
It is all implemented using a single field.
Cons:
You should be able to define a maximum number of depth for your
tree. You also need to define the maximum number of direct children
for the nodes.
Restructuring the tree is more expensive than traversing it. But not
as expensive as Nest Set Model. Adding a new branch is the matter of
finding the right value for the field. And in order to move a branch
into a new parent you need to update that node and all its children
(direct and indirect). The good news is that deleting a node and its
children is as easy as traversing it (which is absolutely nothing).
The technique:
Consider the following table as your tree holder:
CREATE TABLE IF NOT EXISTS `product_category` (
`product_category_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(20) NOT NULL,
`category_code` varchar(62) NOT NULL,
PRIMARY KEY (`product_category_id`),
UNIQUE KEY `uni_category_code` (`category_code`)
) DEFAULT CHARSET=utf8 ;
All the magic is done in category_code field. You need to encode your branch address into a text value as follow:
**node_name -> category_code**
Root -> 01
First child -> 01:01
Second child -> 01:02
First grandchild -> 01:01:01
First child of second child -> 01:02:01
In the above example, each node can have up to 99 direct children (assuming we are thinking in decimal). And since category_code is of type varchar(62), we can have up to (62-2)/3 = 20 depth. It's a trade off between the depth you want and the number of direct children each node can have and the size of your field. Scientifically speaking, this is an implementation of a complete tree in which unused branches are not actually created but reserved.
The good parts:
Now imagine you want to select nodes under 01:02. You can do this using a single query:
SELECT *
FROM product_category
WHERE
category_code LIKE '01:02:%'
Selecting direct nodes under the 01:02:
SELECT *
FROM product_category
WHERE
category_code LIKE '01:02:__'
Selecting all the ancestors of 01:02:
SELECT *
FROM product_category
WHERE
'01:02' LIKE CONCAT(category_code, ':%')
The bad parts:
Inserting a new node into the tree is the matter of finding the right category_code. This can be done using a stored procedure or even in a programming language like PHP.
Since the tree is limited in the number of direct children and depth, an insert can fail. But I believe in most practical cases we can assume such a limitation.
Cheers.
Your solution uses the adjacency list model of a hierarchy. It's by far the most common. It will scale ok up to thousands of products. The problem is that it takes either a recursive query or product specific extensions to SQL to deal with an indefinitely deep hierarchy.
There are other models of a hierarchy. In particular, there's the nested set model. The nested set model is good for retrieving the path of any node in a single query. It's also good for retrieving any desired sub tree. It's more work to keep it up to date. A lot more work.
You may want to briefly explore it before you bite off more than you want to chew.
What are you going to do with the hierarchy?
I think your big issue is that this is a deficiency in MySQL. For most RDBMSs which support WITH and WITH RECURSIVE, you should require only one scan per level. This makes deep hierarchies a bit problematic but usually not too bad.
I think to make this work well you will have to code a fairly extensive stored procedure, or you will have to go to another tree model, or you will have to move to a different RDBMS. For example this is easy to do with PostgreSQL and WITH RECURSIVE and this offers a lot better scalability than many other approaches.

Is it possible to query a tree structure table in MySQL in a single query, to any depth?

I'm thinking the answer is no, but I'd love it it anybody had any insight into how to crawl a tree structure to any depth in SQL (MySQL), but with a single query
More specifically, given a tree structured table (id, data, data, parent_id), and one row in the table, is it possible to get all descendants (child/grandchild/etc), or for that matter all ancestors (parent/grandparent/etc) without knowing how far down or up it will go, using a single query?
Or is using some kind of recursion require, where I keep querying deeper until there are no new results?
Specifically, I'm using Ruby and Rails, but I'm guessing that's not very relevant.
Yes, this is possible, it's a called a Modified Preorder Tree Traversal, as best described here
Joe Celko's Trees and Hierarchies in SQL for Smarties
A working example (in PHP) is provided here
http://www.sitepoint.com/article/hierarchical-data-database/2/
Here are several resources:
http://forums.mysql.com/read.php?10,32818,32818#msg-32818
Managing Hierarchical Data in MySQL
http://lists.mysql.com/mysql/201896
Basically, you'll need to do some sort of cursor in a stored procedure or query or build an adjacency table. I'd avoid recursion outside of the db: depending on how deep your tree is, that could get really slow/sketchy.
Daniel Beardsley's answer is not that bad a solution at all when the main questions you are asking are 'what are all my children' and 'what are all my parents'.
In response to Alex Weinstein, this method actually results in less updates to nodes on a parent movement than in the Celko technique. In Celko's technique, if a level 2 node on the far left moves to under a level 1 node on the far right, then pretty much every node in the tree needs updating, rather than just the node's children.
What I would say however is that Daniel possibly stores the path back to root the wrong way around.
I would store them so that the query would be
SELECT FROM table WHERE ancestors LIKE "1,2,6%"
This means that mysql can make use of an index on the 'ancestors' column, which it would not be able to do with a leading %.
I came across this problem before and had one wacky idea. You could store a field in each record that is concatenated string of it's direct ancestors' ids all the way back to the root.
Imagine you had records like this (indentation implies heirarchy and the numbers are id, ancestors.
1, "1"
2, "2,1"
5, "5,2,1"
6, "6,2,1"
7, "7,6,2,1"
11, "11,6,2,1"
3, "3,1"
8, "8,3,1"
9, "9,3,1"
10, "10,3,1"
Then to select the descendents of id:6, just do this
SELECT FROM table WHERE ancestors LIKE "%6,2,1"
Keeping the ancestors column up to date might be more trouble than it's worth to you, but it's feasible solution in any DB.
Celko's technique (nested sets) is pretty good. I also have used an adjacency table with fields "ancestor" and "descendant" and "distance" (e.g. direct children/parents have a distance of 1, grandchildren/grandparents have a distance of 2, etc).
This needs to be maintained, but is fairly easy to do for inserts: you use a transaction, then put the direct link (parent, child, distance=1) into the table, then INSERT IGNORE a SELECTion of existing parent&children by adding distances (I can pull up the SQL when I have a chance), which wants an index on each of the 3 fields for performance. Where this approach gets ugly is for deletions... you basically have to mark all the items that have been affected and then rebuild them. But an advantage of this is that it can handle arbitrary acyclic graphs, whereas the nested set model can only do straight hierarchies (e.g. each item except the root has one and only one parent).
SQL isn't a Turing Complete language, which means you're not going to be able to perform this sort of looping. You can do some very clever things with SQL and tree structures, but I can't think of a way to describe a row which has a certain id "in its hierarchy" for a hierarchy of arbitrary depth.
Your best bet is something along the lines of what #Dan suggested, which is to just work your way through the tree in some other, more capable language. You can actually generate a query string in a general-purpose language using a loop, where the query is just some convoluted series of joins (or sub-queries) which reflects the depth of the hierarchy you are looking for. That would be more efficient than looping and multiple queries.
This can definitely be done and it isn't that complicated for SQL. I've answered this question and provided a working example using mysql procedural code here:
MySQL: How to find leaves in specific node
Booth: If you are satisfied, you should mark one of the answers as accepted.
I used the "With Emulator" routine described in https://stackoverflow.com/questions/27013093/recursive-query-emulation-in-mysql (provided by https://stackoverflow.com/users/1726419/yossico). So far, I've gotten very good results (performance wise), but I don't have an abundance of data or a large number of descendents to search through/for.
You're almost definitely going to want to employ some recursion for that. And if you're doing that, then it would be trivial (in fact easier) to get the entire tree rather than bits of it to a fixed depth.
In really rough pseudo-code you'll want something along these lines:
getChildren(parent){
children = query(SELECT * FROM table WHERE parent_id = parent.id)
return children
}
printTree(root){
print root
children = getChildren(root)
for child in children {
printTree(child)
}
}
Although in practice you'd rarely want to do something like this. It will be rather inefficient since it's making one request for every row in the table, so it'll only be sensible for either small tables, or trees that aren't nested too deeply. To be honest, in either case you probably want to limit the depth.
However, given the popularity of these kinds of data structure, there may very well be some MySQL stuff to help you with this, specifically to cut down on the numbers of queries you need to make.
Edit: Having thought about it, it makes very little sense to make all these queries. If you're reading the entire table anyway, then you can just slurp the whole thing into RAM - assuming it's small enough!