How to get multi-levels with a single query? [duplicate] - mysql

Given the following table
id parentID name image
0 0 default.jpg
1 0 Jason
2 1 Beth b.jpg
3 0 Layla l.jpg
4 2 Hal
5 4 Ben
I am wanting to do the following:
If I search for Ben, I would like to find the image, if there is no image, I would like to find the parent's image, if that does not exist, I would like to go to the grandparent's image... up until we hit the default image.
What is the most efficient way to do this? I know SQL isn't really designed for hierarchical values, but this is what I need to do.
Cheers!

MySQL lacks recursive queries, which are part of standard SQL. Many other brands of database support this feature, including PostgreSQL (see http://www.postgresql.org/docs/8.4/static/queries-with.html).
There are several techniques for handling hierarchical data in MySQL.
Simplest would be to add a column to note the hierarchy that a given photo belongs to. Then you can search for the photos that belong to the same hierarchy, fetch them all back to your application and figure out the ones you need there. This is slightly wasteful in terms of bandwidth, requires you to write more application code, and it's not good if your trees have many nodes.
There are also a few clever techniques to store hierarchical data so you can query them:
Path Enumeration stores the list of ancestors with each node. For instance, photo 5 in your example would store "0-2-4-5". You can search for ancestors by searching for nodes whose path concatenated with "%" is a match for 5's path with a LIKE predicate.
Nested Sets is a complex but clever technique popularized by Joe Celko in his articles and his book "Trees and Hierarchical in SQL for Smarties." There are numerous online blogs and articles about it too. It's easy to query trees, but hard to query immediate children or parents and hard to insert or delete nodes.
Closure Table involves storing every ancestor/descendant relationship in a separate table. It's easy to query trees, easy to insert and delete, and easy to query immediate parents or children if you add a pathlength column.
You can see more information comparing these methods in my presentation Practical Object-Oriented Models in SQL or my upcoming book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.

Perhaps Managing Hierarchical Data in MySQL helps.

Related

Best method for storing hierarchy of organisations using eloquent

I need to store organisation ownership hierarchy in a laravel backend. Each node in the hierarchy can be one of a number of types, and each relationship needs to carry the amount of ownership (and potentially more meta data relating to the relationship between nodes). The structure can be arbitrarily deep, and it must be possible to attach a subtree an arbitrary number of times (see C1 below, which appears twice). Below is a sketch of kind of hierarchy I need....
I am using mySQL 8 so I have access to CTE for recursion. I have looked into the adjacency-list package (staudenmeir/laravel-adjacency-list) which uses CTE and looks good, but it uses self referencing tables. I think this means that I cannot store relationship data, and the I don't think I can get the repeated sub tree structure you see above.
I am currently exploring many to many relationships, with a custom pivot table to store the "relationship weighting". But I am unsure if this is a sensible approach and perhaps I'm missing some useful design pattern or this.
I am aware that this is a nebulous question, but while I'm trying to crack this myself using eloquent relationships, I thought I might get a discussion going about design pattens for this type of work.

Recursion the optimal solution in this case?

I'm getting a headache over this. I'm building a system, that can handle a number of projects, groups and file references.
Please take a look at this:
A user should be able to create an infinite number of projects, an infinite numbers of groups and attach an infinite number of file references - much like an ordinary PC file structure works with drive-letters, folders and files.
All of the mentioned elements resides inside a MySQL database. However, I'm not sure if this (see below) is the optimal way of structuring the whole thing:
As you can see, it contains one entity called "Xrefs", containing projects and groups. The rows points inside itself, probably making it ideal to do a recursive call when retrieving the data.
A different approach could be to create 1 entity for projects, 1 entity for groups and 1 entity for file references... as well as 1 helper entity, that ties the three entities together, also containing a "parent" value, that (similar to the first solution) refers to the upper level tuples in order to create a hierachy.
If you were to build a similar project, what would you do?
You hit one of the best known restrictions of MySQL: the ability to use what is called recursive queries (PostgreSQL) or CTE queries (Oracle). There are some possibles workarounds, but considering a project with this kind of requirements you'd probably suffer a lot with many other well known MySQL limitations. Even SQLLite would be more usefull (except for the one concurrent user restriction) on this matter.
DBIx::Class has some components to help you circumvent this MySQL limitations, search for Nested Trees, Ordered Trees, WITH RECURSIVE QUERY… [DBIx::Class::Tree::NestedSet][1]
You will need support for something like: 7.8. WITH Queries (Common Table Expressions), which MySQL do not offer to you.
Your structure is fine - since you are building a tree, not a general graph, there is no need for a separate table that ties entities together. I would put projects into their own table, because they appear to stand on their own, unless you must support hierarchy among projects as well.
However, given that your RDBMS is MySQL, you would have problems building recursive queries. For example, try thinking of a query that would give you all files related to xfer_id of 1 (i.e. the project). None of the files is tied to that ID, so you need to locate your first-level groups, then your second level groups, and then tie files to them. Since your groups can be nested in any number of levels, your query would have to be recursive as well.
Although you can certainly do it, it is currently not simple, and requires writing stored procedures. A common approach for situations like that is to build the tree in memory, with some assistance from RDBMS. The trick is to store the id of the top project in each group, i.e.
xfer_id xfer_fk xfer_top
------- ------- --------
1 - 1
2 1 1
3 1 1
4 3 1
5 3 1
Now a query with the condition WHERE xfer_top=... will give your all the individual "parts", which could be combined in memory without having to bring the entire table in memory.

Can a binary tree or tree be always represented in a Database as 1 table and self-referencing?

I didn't feel this rule before, but it seems that a binary tree or any tree (each node can have many children but children cannot point back to any parent), then this data structure can be represented as 1 table in a database, with each row having an ID for itself and a parentID that points back to the parent node.
That is in fact the classical Employee - Manager diagram: one boss can have many people under him... and each person can have n people under him, etc. This is a tree structure and is represented in database books as a common example as a single table Employee.
The answer to your question is 'yes'.
Simon's warning about your trees becoming a cyclic graph is correct too.
All the stuff that has been said about "You have to ensure by hand that this won't happen, i.e. the DBMS won't do that for you automatically, because you will not break any integrity or reference rules.", is WRONG.
This remark and the coresponding comments holds true, as long as you only consider SQL systems.
There exist systems which CAN do this for you in a pure declarative way, that is without you having to write *any* code whatsoever. That system is SIRA_PRISE (http://shark.armchair.mb.ca/~erwin).
Yes, you can represent hierarchical structures by self-referencing the table. Just be aware of such situations:
Employee Supervisor
1 2
2 1
Yes, that is correct. Here's a good reference
Just be aware that you generally need a loop in order to unroll the tree (e.g. find transitive relationships)

Retrieving data with a hierarchical structure in MySQL

Given the following table
id parentID name image
0 0 default.jpg
1 0 Jason
2 1 Beth b.jpg
3 0 Layla l.jpg
4 2 Hal
5 4 Ben
I am wanting to do the following:
If I search for Ben, I would like to find the image, if there is no image, I would like to find the parent's image, if that does not exist, I would like to go to the grandparent's image... up until we hit the default image.
What is the most efficient way to do this? I know SQL isn't really designed for hierarchical values, but this is what I need to do.
Cheers!
MySQL lacks recursive queries, which are part of standard SQL. Many other brands of database support this feature, including PostgreSQL (see http://www.postgresql.org/docs/8.4/static/queries-with.html).
There are several techniques for handling hierarchical data in MySQL.
Simplest would be to add a column to note the hierarchy that a given photo belongs to. Then you can search for the photos that belong to the same hierarchy, fetch them all back to your application and figure out the ones you need there. This is slightly wasteful in terms of bandwidth, requires you to write more application code, and it's not good if your trees have many nodes.
There are also a few clever techniques to store hierarchical data so you can query them:
Path Enumeration stores the list of ancestors with each node. For instance, photo 5 in your example would store "0-2-4-5". You can search for ancestors by searching for nodes whose path concatenated with "%" is a match for 5's path with a LIKE predicate.
Nested Sets is a complex but clever technique popularized by Joe Celko in his articles and his book "Trees and Hierarchical in SQL for Smarties." There are numerous online blogs and articles about it too. It's easy to query trees, but hard to query immediate children or parents and hard to insert or delete nodes.
Closure Table involves storing every ancestor/descendant relationship in a separate table. It's easy to query trees, easy to insert and delete, and easy to query immediate parents or children if you add a pathlength column.
You can see more information comparing these methods in my presentation Practical Object-Oriented Models in SQL or my upcoming book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
Perhaps Managing Hierarchical Data in MySQL helps.

Is it possible to query a tree structure table in MySQL in a single query, to any depth?

I'm thinking the answer is no, but I'd love it it anybody had any insight into how to crawl a tree structure to any depth in SQL (MySQL), but with a single query
More specifically, given a tree structured table (id, data, data, parent_id), and one row in the table, is it possible to get all descendants (child/grandchild/etc), or for that matter all ancestors (parent/grandparent/etc) without knowing how far down or up it will go, using a single query?
Or is using some kind of recursion require, where I keep querying deeper until there are no new results?
Specifically, I'm using Ruby and Rails, but I'm guessing that's not very relevant.
Yes, this is possible, it's a called a Modified Preorder Tree Traversal, as best described here
Joe Celko's Trees and Hierarchies in SQL for Smarties
A working example (in PHP) is provided here
http://www.sitepoint.com/article/hierarchical-data-database/2/
Here are several resources:
http://forums.mysql.com/read.php?10,32818,32818#msg-32818
Managing Hierarchical Data in MySQL
http://lists.mysql.com/mysql/201896
Basically, you'll need to do some sort of cursor in a stored procedure or query or build an adjacency table. I'd avoid recursion outside of the db: depending on how deep your tree is, that could get really slow/sketchy.
Daniel Beardsley's answer is not that bad a solution at all when the main questions you are asking are 'what are all my children' and 'what are all my parents'.
In response to Alex Weinstein, this method actually results in less updates to nodes on a parent movement than in the Celko technique. In Celko's technique, if a level 2 node on the far left moves to under a level 1 node on the far right, then pretty much every node in the tree needs updating, rather than just the node's children.
What I would say however is that Daniel possibly stores the path back to root the wrong way around.
I would store them so that the query would be
SELECT FROM table WHERE ancestors LIKE "1,2,6%"
This means that mysql can make use of an index on the 'ancestors' column, which it would not be able to do with a leading %.
I came across this problem before and had one wacky idea. You could store a field in each record that is concatenated string of it's direct ancestors' ids all the way back to the root.
Imagine you had records like this (indentation implies heirarchy and the numbers are id, ancestors.
1, "1"
2, "2,1"
5, "5,2,1"
6, "6,2,1"
7, "7,6,2,1"
11, "11,6,2,1"
3, "3,1"
8, "8,3,1"
9, "9,3,1"
10, "10,3,1"
Then to select the descendents of id:6, just do this
SELECT FROM table WHERE ancestors LIKE "%6,2,1"
Keeping the ancestors column up to date might be more trouble than it's worth to you, but it's feasible solution in any DB.
Celko's technique (nested sets) is pretty good. I also have used an adjacency table with fields "ancestor" and "descendant" and "distance" (e.g. direct children/parents have a distance of 1, grandchildren/grandparents have a distance of 2, etc).
This needs to be maintained, but is fairly easy to do for inserts: you use a transaction, then put the direct link (parent, child, distance=1) into the table, then INSERT IGNORE a SELECTion of existing parent&children by adding distances (I can pull up the SQL when I have a chance), which wants an index on each of the 3 fields for performance. Where this approach gets ugly is for deletions... you basically have to mark all the items that have been affected and then rebuild them. But an advantage of this is that it can handle arbitrary acyclic graphs, whereas the nested set model can only do straight hierarchies (e.g. each item except the root has one and only one parent).
SQL isn't a Turing Complete language, which means you're not going to be able to perform this sort of looping. You can do some very clever things with SQL and tree structures, but I can't think of a way to describe a row which has a certain id "in its hierarchy" for a hierarchy of arbitrary depth.
Your best bet is something along the lines of what #Dan suggested, which is to just work your way through the tree in some other, more capable language. You can actually generate a query string in a general-purpose language using a loop, where the query is just some convoluted series of joins (or sub-queries) which reflects the depth of the hierarchy you are looking for. That would be more efficient than looping and multiple queries.
This can definitely be done and it isn't that complicated for SQL. I've answered this question and provided a working example using mysql procedural code here:
MySQL: How to find leaves in specific node
Booth: If you are satisfied, you should mark one of the answers as accepted.
I used the "With Emulator" routine described in https://stackoverflow.com/questions/27013093/recursive-query-emulation-in-mysql (provided by https://stackoverflow.com/users/1726419/yossico). So far, I've gotten very good results (performance wise), but I don't have an abundance of data or a large number of descendents to search through/for.
You're almost definitely going to want to employ some recursion for that. And if you're doing that, then it would be trivial (in fact easier) to get the entire tree rather than bits of it to a fixed depth.
In really rough pseudo-code you'll want something along these lines:
getChildren(parent){
children = query(SELECT * FROM table WHERE parent_id = parent.id)
return children
}
printTree(root){
print root
children = getChildren(root)
for child in children {
printTree(child)
}
}
Although in practice you'd rarely want to do something like this. It will be rather inefficient since it's making one request for every row in the table, so it'll only be sensible for either small tables, or trees that aren't nested too deeply. To be honest, in either case you probably want to limit the depth.
However, given the popularity of these kinds of data structure, there may very well be some MySQL stuff to help you with this, specifically to cut down on the numbers of queries you need to make.
Edit: Having thought about it, it makes very little sense to make all these queries. If you're reading the entire table anyway, then you can just slurp the whole thing into RAM - assuming it's small enough!