Insert/update efficient solution to store hierarchical data in MySQL? - mysql

I understand there are several patterns for storing hierarchical data in a relational database such as using adjacency lists, nested sets, etc.
However, the drawback with something like a nested set is that if you frequently have to update nodes by adding/removing children, there's a high cost to then update the rest of the table.
What's a solution for a scenario such as the following example:
(Parent1)
/ | \
(Child1) (Child2) (Child3)
/ |
[Child1a, Child1b][Child2a]
where it will be a frequent requirement to update to:
(Parent1)
/ | \
(Child1) (Child4) (Child5)
/ | \
[Child1a, Child1b][Child4a] [Child5a]
etc.
My data will be nested at most 3 levels deep, but the idea is that the solution should support many of this little trees stored in the table, and children can be updated/modified in a performant manner.

The least expensive method of storing hierarchical data in terms of storage and complexity of updates is Adjacency List.
Defining a child's parent updates exactly 1 row
Moving a child to a new parent updates exactly 1 row
Removing a subtree of N nodes is a deletion of N rows
Adding a subtree of N nodes is an insertion of N rows
The other techniques like Nested Sets or Path Enumeration or Closure Table require more complex updates, but the tradeoff is that those techniques support arbitrary-depth operations without needing recursive query syntax.
If you can guarantee that the tree is never deeper than three levels, you can do many operations with Adjacency List with a couple of simple outer joins.
Note that MySQL 8.0 is implementing recursive query syntax, so the workaround techniques may become less necessary in the future.

Related

How to get multi-levels with a single query? [duplicate]

Given the following table
id parentID name image
0 0 default.jpg
1 0 Jason
2 1 Beth b.jpg
3 0 Layla l.jpg
4 2 Hal
5 4 Ben
I am wanting to do the following:
If I search for Ben, I would like to find the image, if there is no image, I would like to find the parent's image, if that does not exist, I would like to go to the grandparent's image... up until we hit the default image.
What is the most efficient way to do this? I know SQL isn't really designed for hierarchical values, but this is what I need to do.
Cheers!
MySQL lacks recursive queries, which are part of standard SQL. Many other brands of database support this feature, including PostgreSQL (see http://www.postgresql.org/docs/8.4/static/queries-with.html).
There are several techniques for handling hierarchical data in MySQL.
Simplest would be to add a column to note the hierarchy that a given photo belongs to. Then you can search for the photos that belong to the same hierarchy, fetch them all back to your application and figure out the ones you need there. This is slightly wasteful in terms of bandwidth, requires you to write more application code, and it's not good if your trees have many nodes.
There are also a few clever techniques to store hierarchical data so you can query them:
Path Enumeration stores the list of ancestors with each node. For instance, photo 5 in your example would store "0-2-4-5". You can search for ancestors by searching for nodes whose path concatenated with "%" is a match for 5's path with a LIKE predicate.
Nested Sets is a complex but clever technique popularized by Joe Celko in his articles and his book "Trees and Hierarchical in SQL for Smarties." There are numerous online blogs and articles about it too. It's easy to query trees, but hard to query immediate children or parents and hard to insert or delete nodes.
Closure Table involves storing every ancestor/descendant relationship in a separate table. It's easy to query trees, easy to insert and delete, and easy to query immediate parents or children if you add a pathlength column.
You can see more information comparing these methods in my presentation Practical Object-Oriented Models in SQL or my upcoming book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
Perhaps Managing Hierarchical Data in MySQL helps.

Depth of Nested Set

I have a large mysql table with parent-child relationships stored in the nested set model (Left and right values).
It makes it EASY to find all the children of a given item.
Now, how do I find the DEPTH of a certain item.
Example of the row:
Parent_ID, Taxon_ID, Taxon_Name, lft, rgt
for somerow(taxon_id) I want to know how far it is from the root node.
NOW it may be important here to note that in the way I have the data structured is that each terminal node (a node with no children of its own) lft = rgt. I know many of the examples posted online have rgt = lft +1, but we decided not to do that just for sake of ease.
Summary:
Nested set model, need to find the depth (number of nodes to get to the root) of a given node.
I figured it out.
Essentially you have to query all the nodes that contain the node you are looking for inside. So for example, I was looking at one node that has lft=rgt=7330 and I wanted the depth of it. I just needed to
Select count(*)
from table
where lft<7330
AND rgt>7330
You may want to add 1 to the result before you use it because its really telling you the number of generations preceding rather than the actual level. But it works and its fast!
MySQL does not support recursive queries. I believe PostgreSQL offers limited support, but it would be inefficient and messy. There is no reason, however, why you couldn't execute queries in a recursive fashion (i.e., programatically) to arrive at the desired result.
If "how deep is this node?" is a question you need answered often, you might consider adjusting the schema of your table so that each node stores and maintains its depth. Then you can just read that value instead of calculating it with awkward recursion. (Maintenance of the depth values may become tedious if you're shuffling the table, but assuming you are doing far fewer writes than reads, this is a more efficient approach.)

Hierarchical Data in MySQL is as fast as XML to retrieve?

i've got a list of all countries -> states -> cities (-> subcities/villages etc) in a XML file and to retrieve for example a state's all cities it's really quick with XML (using xml parser).
i wonder, if i put all this information in mysql, is retrieving a state's all cities as fast as with XML? cause XML is designed to store hierarchical data while relational databases like mysql are not.
the list contains like 500 000 entities. so i wonder if its as fast as XML using either of:
Adjacency list model
Nested Set model
And which one should i use? Cause (theoretically) there could be unlimited levels under a state (i heard that adjacency isn't good for unlimited child-levels). And which is fastest for this huge dataset?
Thanks!
In this article Quassnoi creates a table with 2,441,405 rows in a heirarchical structure, and tests the performance of highly optimized queries for nested sets and adjacency lists. He runs a variety of different tests, for example fetching ancestors or descendents and times the results (read article for more details of exactly what was tested):
Nested Sets Adjacency Lists
All descendants 300ms 7000ms
All ancestors 15ms 600ms
All descendants up to a certain level 5000ms 600ms
His conclusion is that for MySQL nested sets is faster to query, but has a drawback that it is much slower to update. If you have infrequent updates, use nested sets. Otherwise prefer adjacency lists.
You might also wish to consider if using another database that supports recursive CTEs is an option for you.
I would imagine that an XML file of this size would take a reasonably long time to parse, but if you can cache the parsed structure in memory rather than reading it from disk each time then queries against it will be very fast.
Note that the main drawback of using MySQL for storing heirarchical data is that it requires some very complex queries. Whilst you can just copy the code from the article I linked to, if ever you need you modify it slightly then you will have to understand how it works. If you prefer to keep things simple then XML definitely has an advantage as it was designed for this type of data and so you should easily be able to create the queries you need.

Retrieving data with a hierarchical structure in MySQL

Given the following table
id parentID name image
0 0 default.jpg
1 0 Jason
2 1 Beth b.jpg
3 0 Layla l.jpg
4 2 Hal
5 4 Ben
I am wanting to do the following:
If I search for Ben, I would like to find the image, if there is no image, I would like to find the parent's image, if that does not exist, I would like to go to the grandparent's image... up until we hit the default image.
What is the most efficient way to do this? I know SQL isn't really designed for hierarchical values, but this is what I need to do.
Cheers!
MySQL lacks recursive queries, which are part of standard SQL. Many other brands of database support this feature, including PostgreSQL (see http://www.postgresql.org/docs/8.4/static/queries-with.html).
There are several techniques for handling hierarchical data in MySQL.
Simplest would be to add a column to note the hierarchy that a given photo belongs to. Then you can search for the photos that belong to the same hierarchy, fetch them all back to your application and figure out the ones you need there. This is slightly wasteful in terms of bandwidth, requires you to write more application code, and it's not good if your trees have many nodes.
There are also a few clever techniques to store hierarchical data so you can query them:
Path Enumeration stores the list of ancestors with each node. For instance, photo 5 in your example would store "0-2-4-5". You can search for ancestors by searching for nodes whose path concatenated with "%" is a match for 5's path with a LIKE predicate.
Nested Sets is a complex but clever technique popularized by Joe Celko in his articles and his book "Trees and Hierarchical in SQL for Smarties." There are numerous online blogs and articles about it too. It's easy to query trees, but hard to query immediate children or parents and hard to insert or delete nodes.
Closure Table involves storing every ancestor/descendant relationship in a separate table. It's easy to query trees, easy to insert and delete, and easy to query immediate parents or children if you add a pathlength column.
You can see more information comparing these methods in my presentation Practical Object-Oriented Models in SQL or my upcoming book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
Perhaps Managing Hierarchical Data in MySQL helps.

Multiple tables in nested sets hierarchy

I have a number of distinct items stored in different MySQL tables, which I'd like to put in a tree hierarchy. Using the adjacency list model, I can add a parent_id field to each table and link the tables using a foreign key relationship.
However, I'd like to use a nested sets/modified preorder tree traversal model. The data will be used in a environment that's heavily biased towards reads, and the kind of queries I expect to run favour this approach.
The problem is that all the information I have on nested sets assumes that you only have one type of item, stored in a single table. The ways round this that I can think of are:
Having multiple foreign key fields in the tree, one for each table/item type.
Storing the name of the item table in the tree structure as well as the item ID.
Both approaches are inelegant to say the least, so is there a better way of doing this?
RDBMS are not a good match to storing hierarchies to begin with, and your use case makes this even worse. I think a little more fine tuned but still ugly variations of your own suggestions are what you are going to get using a RDBMS. IMHO other data models would provide better solutions to your problem, like graph databases or maybe document databases. The article Should you go Beyond Relational Databases? gives a nice introduction to this kind of stuff.
You have have several types of tree, and a single table which contains the tree information (i.e. the left/right values) for all tree types?
If you have have several types of tree, why not a different table for each type?