Hierarchical queries in MySQL - mysql

I'm trying to find all the parents, grandparents, etc. of a particular field with any depth. For example, given the below structure, if I provide 5, the values returned should be 1, 2, 3 and 4.
| a | b |
-----------
| 1 | 2 |
| 2 | 3 |
| 3 | 4 |
| 4 | 5 |
| 3 | 6 |
| 4 | 7 |
How would I do this?

SELECT #id :=
(
SELECT senderid
FROM mytable
WHERE receiverid = #id
) AS person
FROM (
SELECT #id := 5
) vars
STRAIGHT_JOIN
mytable
WHERE #id IS NOT NULL

The following answer is not MYSQL-only, but uses PHP. This answer can be useful for all those that end up on this page during their search (as I did) but are not limited to using MYSQL only.
If you have a database with a nested structure of unknown depth, you can print out the contents using a recursive loop:
function goDownALevel($parent){
$children = $parent->getChildren(); //underlying SQL function
if($children != null){
foreach($children as $child){
//Print the child content here
goDownALevel($child);
}
}
}
This function can also be rewritten in any other language like Javascript.

Related

MySQL 5.7 Building Hierarchy from Columns. Dynamically generate Parent ID [duplicate]

Assume you have a flat table that stores an ordered tree hierarchy:
Id Name ParentId Order
1 'Node 1' 0 10
2 'Node 1.1' 1 10
3 'Node 2' 0 20
4 'Node 1.1.1' 2 10
5 'Node 2.1' 3 10
6 'Node 1.2' 1 20
Here's a diagram, where we have [id] Name. Root node 0 is fictional.
[0] ROOT
/ \
[1] Node 1 [3] Node 2
/ \ \
[2] Node 1.1 [6] Node 1.2 [5] Node 2.1
/
[4] Node 1.1.1
What minimalistic approach would you use to output that to HTML (or text, for that matter) as a correctly ordered, correctly indented tree?
Assume further you only have basic data structures (arrays and hashmaps), no fancy objects with parent/children references, no ORM, no framework, just your two hands. The table is represented as a result set, which can be accessed randomly.
Pseudo code or plain English is okay, this is purely a conceptional question.
Bonus question: Is there a fundamentally better way to store a tree structure like this in a RDBMS?
EDITS AND ADDITIONS
To answer one commenter's (Mark Bessey's) question: A root node is not necessary, because it is never going to be displayed anyway. ParentId = 0 is the convention to express "these are top level". The Order column defines how nodes with the same parent are going to be sorted.
The "result set" I spoke of can be pictured as an array of hashmaps (to stay in that terminology). For my example was meant to be already there. Some answers go the extra mile and construct it first, but thats okay.
The tree can be arbitrarily deep. Each node can have N children. I did not exactly have a "millions of entries" tree in mind, though.
Don't mistake my choice of node naming ('Node 1.1.1') for something to rely on. The nodes could equally well be called 'Frank' or 'Bob', no naming structure is implied, this was merely to make it readable.
I have posted my own solution so you guys can pull it to pieces.
Now that MySQL 8.0 supports recursive queries, we can say that all popular SQL databases support recursive queries in standard syntax.
WITH RECURSIVE MyTree AS (
SELECT * FROM MyTable WHERE ParentId IS NULL
UNION ALL
SELECT m.* FROM MyTABLE AS m JOIN MyTree AS t ON m.ParentId = t.Id
)
SELECT * FROM MyTree;
I tested recursive queries in MySQL 8.0 in my presentation Recursive Query Throwdown in 2017.
Below is my original answer from 2008:
There are several ways to store tree-structured data in a relational database. What you show in your example uses two methods:
Adjacency List (the "parent" column) and
Path Enumeration (the dotted-numbers in your name column).
Another solution is called Nested Sets, and it can be stored in the same table too. Read "Trees and Hierarchies in SQL for Smarties" by Joe Celko for a lot more information on these designs.
I usually prefer a design called Closure Table (aka "Adjacency Relation") for storing tree-structured data. It requires another table, but then querying trees is pretty easy.
I cover Closure Table in my presentation Models for Hierarchical Data with SQL and PHP and in my book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
CREATE TABLE ClosureTable (
ancestor_id INT NOT NULL REFERENCES FlatTable(id),
descendant_id INT NOT NULL REFERENCES FlatTable(id),
PRIMARY KEY (ancestor_id, descendant_id)
);
Store all paths in the Closure Table, where there is a direct ancestry from one node to another. Include a row for each node to reference itself. For example, using the data set you showed in your question:
INSERT INTO ClosureTable (ancestor_id, descendant_id) VALUES
(1,1), (1,2), (1,4), (1,6),
(2,2), (2,4),
(3,3), (3,5),
(4,4),
(5,5),
(6,6);
Now you can get a tree starting at node 1 like this:
SELECT f.*
FROM FlatTable f
JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1;
The output (in MySQL client) looks like the following:
+----+
| id |
+----+
| 1 |
| 2 |
| 4 |
| 6 |
+----+
In other words, nodes 3 and 5 are excluded, because they're part of a separate hierarchy, not descending from node 1.
Re: comment from e-satis about immediate children (or immediate parent). You can add a "path_length" column to the ClosureTable to make it easier to query specifically for an immediate child or parent (or any other distance).
INSERT INTO ClosureTable (ancestor_id, descendant_id, path_length) VALUES
(1,1,0), (1,2,1), (1,4,2), (1,6,1),
(2,2,0), (2,4,1),
(3,3,0), (3,5,1),
(4,4,0),
(5,5,0),
(6,6,0);
Then you can add a term in your search for querying the immediate children of a given node. These are descendants whose path_length is 1.
SELECT f.*
FROM FlatTable f
JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1
AND path_length = 1;
+----+
| id |
+----+
| 2 |
| 6 |
+----+
Re comment from #ashraf: "How about sorting the whole tree [by name]?"
Here's an example query to return all nodes that are descendants of node 1, join them to the FlatTable that contains other node attributes such as name, and sort by the name.
SELECT f.name
FROM FlatTable f
JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1
ORDER BY f.name;
Re comment from #Nate:
SELECT f.name, GROUP_CONCAT(b.ancestor_id order by b.path_length desc) AS breadcrumbs
FROM FlatTable f
JOIN ClosureTable a ON (f.id = a.descendant_id)
JOIN ClosureTable b ON (b.descendant_id = a.descendant_id)
WHERE a.ancestor_id = 1
GROUP BY a.descendant_id
ORDER BY f.name
+------------+-------------+
| name | breadcrumbs |
+------------+-------------+
| Node 1 | 1 |
| Node 1.1 | 1,2 |
| Node 1.1.1 | 1,2,4 |
| Node 1.2 | 1,6 |
+------------+-------------+
A user suggested an edit today. SO moderators approved the edit, but I am reversing it.
The edit suggested that the ORDER BY in the last query above should be ORDER BY b.path_length, f.name, presumably to make sure the ordering matches the hierarchy. But this doesn't work, because it would order "Node 1.1.1" after "Node 1.2".
If you want the ordering to match the hierarchy in a sensible way, that is possible, but not simply by ordering by the path length. For example, see my answer to MySQL Closure Table hierarchical database - How to pull information out in the correct order.
If you use nested sets (sometimes referred to as Modified Pre-order Tree Traversal) you can extract the entire tree structure or any subtree within it in tree order with a single query, at the cost of inserts being more expensive, as you need to manage columns which describe an in-order path through thee tree structure.
For django-mptt, I used a structure like this:
id parent_id tree_id level lft rght
-- --------- ------- ----- --- ----
1 null 1 0 1 14
2 1 1 1 2 7
3 2 1 2 3 4
4 2 1 2 5 6
5 1 1 1 8 13
6 5 1 2 9 10
7 5 1 2 11 12
Which describes a tree which looks like this (with id representing each item):
1
+-- 2
| +-- 3
| +-- 4
|
+-- 5
+-- 6
+-- 7
Or, as a nested set diagram which makes it more obvious how the lft and rght values work:
__________________________________________________________________________
| Root 1 |
| ________________________________ ________________________________ |
| | Child 1.1 | | Child 1.2 | |
| | ___________ ___________ | | ___________ ___________ | |
| | | C 1.1.1 | | C 1.1.2 | | | | C 1.2.1 | | C 1.2.2 | | |
1 2 3___________4 5___________6 7 8 9___________10 11__________12 13 14
| |________________________________| |________________________________| |
|__________________________________________________________________________|
As you can see, to get the entire subtree for a given node, in tree order, you simply have to select all rows which have lft and rght values between its lft and rght values. It's also simple to retrieve the tree of ancestors for a given node.
The level column is a bit of denormalisation for convenience more than anything and the tree_id column allows you to restart the lft and rght numbering for each top-level node, which reduces the number of columns affected by inserts, moves and deletions, as the lft and rght columns have to be adjusted accordingly when these operations take place in order to create or close gaps. I made some development notes at the time when I was trying to wrap my head around the queries required for each operation.
In terms of actually working with this data to display a tree, I created a tree_item_iterator utility function which, for each node, should give you sufficient information to generate whatever kind of display you want.
More info about MPTT:
Trees in SQL
Storing Hierarchical Data in a Database
Managing Hierarchical Data in MySQL
It's a quite old question, but as it's got many views I think it's worth to present an alternative, and in my opinion very elegant, solution.
In order to read a tree structure you can use recursive Common Table Expressions (CTEs). It gives a possibility to fetch whole tree structure at once, have the information about the level of the node, its parent node and order within children of the parent node.
Let me show you how this would work in PostgreSQL 9.1.
Create a structure
CREATE TABLE tree (
id int NOT NULL,
name varchar(32) NOT NULL,
parent_id int NULL,
node_order int NOT NULL,
CONSTRAINT tree_pk PRIMARY KEY (id),
CONSTRAINT tree_tree_fk FOREIGN KEY (parent_id)
REFERENCES tree (id) NOT DEFERRABLE
);
insert into tree values
(0, 'ROOT', NULL, 0),
(1, 'Node 1', 0, 10),
(2, 'Node 1.1', 1, 10),
(3, 'Node 2', 0, 20),
(4, 'Node 1.1.1', 2, 10),
(5, 'Node 2.1', 3, 10),
(6, 'Node 1.2', 1, 20);
Write a query
WITH RECURSIVE
tree_search (id, name, level, parent_id, node_order) AS (
SELECT
id,
name,
0,
parent_id,
1
FROM tree
WHERE parent_id is NULL
UNION ALL
SELECT
t.id,
t.name,
ts.level + 1,
ts.id,
t.node_order
FROM tree t, tree_search ts
WHERE t.parent_id = ts.id
)
SELECT * FROM tree_search
WHERE level > 0
ORDER BY level, parent_id, node_order;
Here are the results:
id | name | level | parent_id | node_order
----+------------+-------+-----------+------------
1 | Node 1 | 1 | 0 | 10
3 | Node 2 | 1 | 0 | 20
2 | Node 1.1 | 2 | 1 | 10
6 | Node 1.2 | 2 | 1 | 20
5 | Node 2.1 | 2 | 3 | 10
4 | Node 1.1.1 | 3 | 2 | 10
(6 rows)
The tree nodes are ordered by a level of depth. In the final output we would present them in the subsequent lines.
For each level, they are ordered by parent_id and node_order within the parent. This tells us how to present them in the output - link node to the parent in this order.
Having such a structure it wouldn't be difficult to make a really nice presentation in HTML.
Recursive CTEs are available in PostgreSQL, IBM DB2, MS SQL Server, Oracle and SQLite.
If you'd like to read more on recursive SQL queries, you can either check the documentation of your favourite DBMS or read my two articles covering this topic:
Do It In SQL: Recursive Tree Traversal
Get to know the power of SQL recursive queries
As of Oracle 9i, you can use CONNECT BY.
SELECT LPAD(' ', (LEVEL - 1) * 4) || "Name" AS "Name"
FROM (SELECT * FROM TMP_NODE ORDER BY "Order")
CONNECT BY PRIOR "Id" = "ParentId"
START WITH "Id" IN (SELECT "Id" FROM TMP_NODE WHERE "ParentId" = 0)
As of SQL Server 2005, you can use a recursive common table expression (CTE).
WITH [NodeList] (
[Id]
, [ParentId]
, [Level]
, [Order]
) AS (
SELECT [Node].[Id]
, [Node].[ParentId]
, 0 AS [Level]
, CONVERT([varchar](MAX), [Node].[Order]) AS [Order]
FROM [Node]
WHERE [Node].[ParentId] = 0
UNION ALL
SELECT [Node].[Id]
, [Node].[ParentId]
, [NodeList].[Level] + 1 AS [Level]
, [NodeList].[Order] + '|'
+ CONVERT([varchar](MAX), [Node].[Order]) AS [Order]
FROM [Node]
INNER JOIN [NodeList] ON [NodeList].[Id] = [Node].[ParentId]
) SELECT REPLICATE(' ', [NodeList].[Level] * 4) + [Node].[Name] AS [Name]
FROM [Node]
INNER JOIN [NodeList] ON [NodeList].[Id] = [Node].[Id]
ORDER BY [NodeList].[Order]
Both will output the following results.
Name
'Node 1'
' Node 1.1'
' Node 1.1.1'
' Node 1.2'
'Node 2'
' Node 2.1'
Bill's answer is pretty gosh-darned good, this answer adds some things to it which makes me wish SO supported threaded answers.
Anyway I wanted to support the tree structure and the Order property. I included a single property in each Node called leftSibling that does the same thing Order is meant to do in the original question (maintain left-to-right order).
mysql> desc nodes ;
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(255) | YES | | NULL | |
| leftSibling | int(11) | NO | | 0 | |
+-------------+--------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
mysql> desc adjacencies;
+------------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------+------+-----+---------+----------------+
| relationId | int(11) | NO | PRI | NULL | auto_increment |
| parent | int(11) | NO | | NULL | |
| child | int(11) | NO | | NULL | |
| pathLen | int(11) | NO | | NULL | |
+------------+---------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
More detail and SQL code on my blog.
Thanks Bill your answer was helpful in getting started!
There are really good solutions which exploit the internal btree representation of sql indices. This is based on some great research done back around 1998.
Here is an example table (in mysql).
CREATE TABLE `node` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`tw` int(10) unsigned NOT NULL,
`pa` int(10) unsigned DEFAULT NULL,
`sz` int(10) unsigned DEFAULT NULL,
`nc` int(11) GENERATED ALWAYS AS (tw+sz) STORED,
PRIMARY KEY (`id`),
KEY `node_tw_index` (`tw`),
KEY `node_pa_index` (`pa`),
KEY `node_nc_index` (`nc`),
CONSTRAINT `node_pa_fk` FOREIGN KEY (`pa`) REFERENCES `node` (`tw`) ON DELETE CASCADE
)
The only fields necessary for the tree representation are:
tw: The Left to Right DFS Pre-order index, where root = 1.
pa: The reference (using tw) to the parent node, root has null.
sz: The size of the node's branch including itself.
nc: is used as syntactic sugar. it is tw+sz and represents the tw of the node's "next child".
Here is an example 24 node population, ordered by tw:
+-----+---------+----+------+------+------+
| id | name | tw | pa | sz | nc |
+-----+---------+----+------+------+------+
| 1 | Root | 1 | NULL | 24 | 25 |
| 2 | A | 2 | 1 | 14 | 16 |
| 3 | AA | 3 | 2 | 1 | 4 |
| 4 | AB | 4 | 2 | 7 | 11 |
| 5 | ABA | 5 | 4 | 1 | 6 |
| 6 | ABB | 6 | 4 | 3 | 9 |
| 7 | ABBA | 7 | 6 | 1 | 8 |
| 8 | ABBB | 8 | 6 | 1 | 9 |
| 9 | ABC | 9 | 4 | 2 | 11 |
| 10 | ABCD | 10 | 9 | 1 | 11 |
| 11 | AC | 11 | 2 | 4 | 15 |
| 12 | ACA | 12 | 11 | 2 | 14 |
| 13 | ACAA | 13 | 12 | 1 | 14 |
| 14 | ACB | 14 | 11 | 1 | 15 |
| 15 | AD | 15 | 2 | 1 | 16 |
| 16 | B | 16 | 1 | 1 | 17 |
| 17 | C | 17 | 1 | 6 | 23 |
| 359 | C0 | 18 | 17 | 5 | 23 |
| 360 | C1 | 19 | 18 | 4 | 23 |
| 361 | C2(res) | 20 | 19 | 3 | 23 |
| 362 | C3 | 21 | 20 | 2 | 23 |
| 363 | C4 | 22 | 21 | 1 | 23 |
| 18 | D | 23 | 1 | 1 | 24 |
| 19 | E | 24 | 1 | 1 | 25 |
+-----+---------+----+------+------+------+
Every tree result can be done non-recursively.
For instance, to get a list of ancestors of node at tw='22'
Ancestors
select anc.* from node me,node anc
where me.tw=22 and anc.nc >= me.tw and anc.tw <= me.tw
order by anc.tw;
+-----+---------+----+------+------+------+
| id | name | tw | pa | sz | nc |
+-----+---------+----+------+------+------+
| 1 | Root | 1 | NULL | 24 | 25 |
| 17 | C | 17 | 1 | 6 | 23 |
| 359 | C0 | 18 | 17 | 5 | 23 |
| 360 | C1 | 19 | 18 | 4 | 23 |
| 361 | C2(res) | 20 | 19 | 3 | 23 |
| 362 | C3 | 21 | 20 | 2 | 23 |
| 363 | C4 | 22 | 21 | 1 | 23 |
+-----+---------+----+------+------+------+
Siblings and children are trivial - just use pa field ordering by tw.
Descendants
For example the set (branch) of nodes that are rooted at tw = 17.
select des.* from node me,node des
where me.tw=17 and des.tw < me.nc and des.tw >= me.tw
order by des.tw;
+-----+---------+----+------+------+------+
| id | name | tw | pa | sz | nc |
+-----+---------+----+------+------+------+
| 17 | C | 17 | 1 | 6 | 23 |
| 359 | C0 | 18 | 17 | 5 | 23 |
| 360 | C1 | 19 | 18 | 4 | 23 |
| 361 | C2(res) | 20 | 19 | 3 | 23 |
| 362 | C3 | 21 | 20 | 2 | 23 |
| 363 | C4 | 22 | 21 | 1 | 23 |
+-----+---------+----+------+------+------+
Additional Notes
This methodology is extremely useful for when there are a far greater number of reads than there are inserts or updates.
Because the insertion, movement, or updating of a node in the tree requires the tree to be adjusted, it is necessary to lock the table before commencing with the action.
The insertion/deletion cost is high because the tw index and sz (branch size) values will need to be updated on all the nodes after the insertion point, and for all ancestors respectively.
Branch moving involves moving the tw value of the branch out of range, so it is also necessary to disable foreign key constraints when moving a branch. There are, essentially four queries required to move a branch:
Move the branch out of range.
Close the gap that it left. (the remaining tree is now normalised).
Open the gap where it will go to.
Move the branch into it's new position.
Adjust Tree Queries
The opening/closing of gaps in the tree is an important sub-function used by create/update/delete methods, so I include it here.
We need two parameters - a flag representing whether or not we are downsizing or upsizing, and the node's tw index. So, for example tw=18 (which has a branch size of 5). Let's assume that we are downsizing (removing tw) - this means that we are using '-' instead of '+' in the updates of the following example.
We first use a (slightly altered) ancestor function to update the sz value.
update node me, node anc set anc.sz = anc.sz - me.sz from
node me, node anc where me.tw=18
and ((anc.nc >= me.tw and anc.tw < me.pa) or (anc.tw=me.pa));
Then we need to adjust the tw for those whose tw is higher than the branch to be removed.
update node me, node anc set anc.tw = anc.tw - me.sz from
node me, node anc where me.tw=18 and anc.tw >= me.tw;
Then we need to adjust the parent for those whose pa's tw is higher than the branch to be removed.
update node me, node anc set anc.pa = anc.pa - me.sz from
node me, node anc where me.tw=18 and anc.pa >= me.tw;
Well given the choice, I'd be using objects. I'd create an object for each record where each object has a children collection and store them all in an assoc array (/hashtable) where the Id is the key. And blitz through the collection once, adding the children to the relevant children fields. Simple.
But because you're being no fun by restricting use of some good OOP, I'd probably iterate based on:
function PrintLine(int pID, int level)
foreach record where ParentID == pID
print level*tabs + record-data
PrintLine(record.ID, level + 1)
PrintLine(0, 0)
Edit: this is similar to a couple of other entries, but I think it's slightly cleaner. One thing I'll add: this is extremely SQL-intensive. It's nasty. If you have the choice, go the OOP route.
This was written quickly, and is neither pretty nor efficient (plus it autoboxes alot, converting between int and Integer is annoying!), but it works.
It probably breaks the rules since I'm creating my own objects but hey I'm doing this as a diversion from real work :)
This also assumes that the resultSet/table is completely read into some sort of structure before you start building Nodes, which wouldn't be the best solution if you have hundreds of thousands of rows.
public class Node {
private Node parent = null;
private List<Node> children;
private String name;
private int id = -1;
public Node(Node parent, int id, String name) {
this.parent = parent;
this.children = new ArrayList<Node>();
this.name = name;
this.id = id;
}
public int getId() {
return this.id;
}
public String getName() {
return this.name;
}
public void addChild(Node child) {
children.add(child);
}
public List<Node> getChildren() {
return children;
}
public boolean isRoot() {
return (this.parent == null);
}
#Override
public String toString() {
return "id=" + id + ", name=" + name + ", parent=" + parent;
}
}
public class NodeBuilder {
public static Node build(List<Map<String, String>> input) {
// maps id of a node to it's Node object
Map<Integer, Node> nodeMap = new HashMap<Integer, Node>();
// maps id of a node to the id of it's parent
Map<Integer, Integer> childParentMap = new HashMap<Integer, Integer>();
// create special 'root' Node with id=0
Node root = new Node(null, 0, "root");
nodeMap.put(root.getId(), root);
// iterate thru the input
for (Map<String, String> map : input) {
// expect each Map to have keys for "id", "name", "parent" ... a
// real implementation would read from a SQL object or resultset
int id = Integer.parseInt(map.get("id"));
String name = map.get("name");
int parent = Integer.parseInt(map.get("parent"));
Node node = new Node(null, id, name);
nodeMap.put(id, node);
childParentMap.put(id, parent);
}
// now that each Node is created, setup the child-parent relationships
for (Map.Entry<Integer, Integer> entry : childParentMap.entrySet()) {
int nodeId = entry.getKey();
int parentId = entry.getValue();
Node child = nodeMap.get(nodeId);
Node parent = nodeMap.get(parentId);
parent.addChild(child);
}
return root;
}
}
public class NodePrinter {
static void printRootNode(Node root) {
printNodes(root, 0);
}
static void printNodes(Node node, int indentLevel) {
printNode(node, indentLevel);
// recurse
for (Node child : node.getChildren()) {
printNodes(child, indentLevel + 1);
}
}
static void printNode(Node node, int indentLevel) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < indentLevel; i++) {
sb.append("\t");
}
sb.append(node);
System.out.println(sb.toString());
}
public static void main(String[] args) {
// setup dummy data
List<Map<String, String>> resultSet = new ArrayList<Map<String, String>>();
resultSet.add(newMap("1", "Node 1", "0"));
resultSet.add(newMap("2", "Node 1.1", "1"));
resultSet.add(newMap("3", "Node 2", "0"));
resultSet.add(newMap("4", "Node 1.1.1", "2"));
resultSet.add(newMap("5", "Node 2.1", "3"));
resultSet.add(newMap("6", "Node 1.2", "1"));
Node root = NodeBuilder.build(resultSet);
printRootNode(root);
}
//convenience method for creating our dummy data
private static Map<String, String> newMap(String id, String name, String parentId) {
Map<String, String> row = new HashMap<String, String>();
row.put("id", id);
row.put("name", name);
row.put("parent", parentId);
return row;
}
}
Assuming that you know that the root elements are zero, here's the pseudocode to output to text:
function PrintLevel (int curr, int level)
//print the indents
for (i=1; i<=level; i++)
print a tab
print curr \n;
for each child in the table with a parent of curr
PrintLevel (child, level+1)
for each elementID where the parentid is zero
PrintLevel(elementID, 0)
You can emulate any other data structure with a hashmap, so that's not a terrible limitation. Scanning from the top to the bottom, you create a hashmap for each row of the database, with an entry for each column. Add each of these hashmaps to a "master" hashmap, keyed on the id. If any node has a "parent" that you haven't seen yet, create an placeholder entry for it in the master hashmap, and fill it in when you see the actual node.
To print it out, do a simple depth-first pass through the data, keeping track of indent level along the way. You can make this easier by keeping a "children" entry for each row, and populating it as you scan the data.
As for whether there's a "better" way to store a tree in a database, that depends on how you're going to use the data. I've seen systems that had a known maximum depth that used a different table for each level in the hierarchy. That makes a lot of sense if the levels in the tree aren't quite equivalent after all (top level categories being different than the leaves).
If nested hash maps or arrays can be created, then I can simply go down the table from the beginning and add each item to the nested array. I must trace each line to the root node in order to know which level in the nested array to insert into. I can employ memoization so that I do not need to look up the same parent over and over again.
Edit: I would read the entire table into an array first, so it will not query the DB repeatedly. Of course this won't be practical if your table is very large.
After the structure is built, I must do a depth first traverse through it and print out the HTML.
There's no better fundamental way to store this information using one table (I could be wrong though ;), and would love to see a better solution ). However, if you create a scheme to employ dynamically created db tables, then you opened up a whole new world at the sacrifice of simplicity, and the risk of SQL hell ;).
To Extend Bill's SQL solution you can basically do the same using a flat array. Further more if your strings all have the same lenght and your maximum number of children are known (say in a binary tree) you can do it using a single string (character array). If you have arbitrary number of children this complicates things a bit... I would have to check my old notes to see what can be done.
Then, sacrificing a bit of memory, especially if your tree is sparse and/or unballanced, you can, with a bit of index math, access all the strings randomly by storing your tree, width first in the array like so (for a binary tree):
String[] nodeArray = [L0root, L1child1, L1child2, L2Child1, L2Child2, L2Child3, L2Child4] ...
yo know your string length, you know it
I'm at work now so cannot spend much time on it but with interest I can fetch a bit of code to do this.
We use to do it to search in binary trees made of DNA codons, a process built the tree, then we flattened it to search text patterns and when found, though index math (revers from above) we get the node back... very fast and efficient, tough our tree rarely had empty nodes, but we could searh gigabytes of data in a jiffy.
Pre-order transversal with on-the-fly path enumeration on adjacency representation
Nested sets from:
Konchog https://stackoverflow.com/a/42781302/895245
Jonny Buchanan https://stackoverflow.com/a/194031/895245
is the only efficient way I've seen of traversing, at the cost of slower updates. That's likely what most people will want for pre-order.
Closure table from https://stackoverflow.com/a/192462/895245 is interesting, but I don't see how to enforce pre-order there: MySQL Closure Table hierarchical database - How to pull information out in the correct order
Mostly for fun, here's a method that recursively calculates the 1.3.2.5. prefixes on the fly and sorts by them at the end, based only on the parent ID/child index representation.
Upsides:
updates only need to update the indexes of each sibling
Downsides:
n^2 memory usage worst case for a super deep tree. This could be quite serious, which is why I say this method is likely mostly for fun only. But maybe there is some ultra high update case where someone would want to use it? Who knows
recursive queries, so reads are going to be less efficient than nested sets
Create and populate table:
CREATE TABLE "ParentIndexTree" (
"id" INTEGER PRIMARY KEY,
"parentId" INTEGER,
"childIndex" INTEGER NOT NULL,
"value" INTEGER NOT NULL,
"name" TEXT NOT NULL,
FOREIGN KEY ("parentId") REFERENCES "ParentIndexTree"(id)
)
;
INSERT INTO "ParentIndexTree" VALUES
(0, NULL, 0, 1, 'one' ),
(1, 0, 0, 2, 'two' ),
(2, 0, 1, 3, 'three'),
(3, 1, 0, 4, 'four' ),
(4, 1, 1, 5, 'five' )
;
Represented tree:
1
/ \
2 3
/ \
4 5
Then for a DBMS with arrays like PostgreSQL](https://www.postgresql.org/docs/14/arrays.html):
WITH RECURSIVE "TreeSearch" (
"id",
"parentId",
"childIndex",
"value",
"name",
"prefix"
) AS (
SELECT
"id",
"parentId",
"childIndex",
"value",
"name",
array[0]
FROM "ParentIndexTree"
WHERE "parentId" IS NULL
UNION ALL
SELECT
"child"."id",
"child"."parentId",
"child"."childIndex",
"child"."value",
"child"."name",
array_append("parent"."prefix", "child"."childIndex")
FROM "ParentIndexTree" AS "child"
JOIN "TreeSearch" AS "parent"
ON "child"."parentId" = "parent"."id"
)
SELECT * FROM "TreeSearch"
ORDER BY "prefix"
;
This creates on the fly prefixes of form:
1 -> 0
2 -> 0, 0
3 -> 0, 1
4 -> 0, 0, 0
5 -> 0, 0, 1
and then PostgreSQL then sorts by the arrays alphabetically as:
1 -> 0
2 -> 0, 0
4 -> 0, 0, 0
5 -> 0, 0, 1
3 -> 0, 1
which is the pre-order result that we want.
For a DBMS without arrays like SQLite, you can hack by encoding the prefix with a string of fixed width integers. Binary would be ideal, but I couldn't find out how, so hex would work. This of course means you will have to select a maximum depth that will fit in the number of bytes selected, e.g. below I choose 6 allowing for a maximum of 16^6 children per node.
WITH RECURSIVE "TreeSearch" (
"id",
"parentId",
"childIndex",
"value",
"name",
"prefix"
) AS (
SELECT
"id",
"parentId",
"childIndex",
"value",
"name",
'000000'
FROM "ParentIndexTree"
WHERE "parentId" IS NULL
UNION ALL
SELECT
"child"."id",
"child"."parentId",
"child"."childIndex",
"child"."value",
"child"."name",
"parent"."prefix" || printf('%06x', "child"."childIndex")
FROM "ParentIndexTree" AS "child"
JOIN "TreeSearch" AS "parent"
ON "child"."parentId" = "parent"."id"
)
SELECT * FROM "TreeSearch"
ORDER BY "prefix"
;
Some nested set notes
Here are a few points which confused me a bit after looking at the other nested set answers.
Jonny Buchanan shows his nested set setup as:
__________________________________________________________________________
| Root 1 |
| ________________________________ ________________________________ |
| | Child 1.1 | | Child 1.2 | |
| | ___________ ___________ | | ___________ ___________ | |
| | | C 1.1.1 | | C 1.1.2 | | | | C 1.2.1 | | C 1.2.2 | | |
1 2 3___________4 5___________6 7 8 9___________10 11__________12 13 14
| |________________________________| |________________________________| |
|__________________________________________________________________________|
which made me wonder why not just use the simpler looking:
__________________________________________________________________________
| Root 1 |
| ________________________________ _______________________________ |
| | Child 1.1 | | Child 1.2 | |
| | ___________ ___________ | | ___________ ___________ | |
| | | C 1.1.1 | | C 1.1.2 | | | | C 1.2.1 | | C 1.2.2 | | |
1 2 3___________| 4___________| | 5 6___________| 7___________| | |
| |________________________________| |_______________________________| |
|_________________________________________________________________________|
which does not have an extra number for each endpoint.
But then when I actually tried to implement it, I noticed that it was hard/impossible to implement the update queries like that, unless I had parent information as used by Konchog. The problem is that it was hard/impossible to distinguish between a sibling and a parent in one case while the tree was being moved around, and I needed that to decide if I was going to reduce the right hand side or not while closing a gap.
Left/size vs left/right: you could store it either way in the database, but I think left/right can be more efficient as you can index the DB with a multicolumn index (left, right) which can then be used to speed up ancestor queries, which are of type:
left < curLeft AND right > curLeft
Tested on Ubuntu 22.04, PostgreSQL 14.5, SQLite 3.34.0.
If elements are in tree order, as shown in your example, you can use something like the following Python example:
delimiter = '.'
stack = []
for item in items:
while stack and not item.startswith(stack[-1]+delimiter):
print "</div>"
stack.pop()
print "<div>"
print item
stack.append(item)
What this does is maintain a stack representing the current position in the tree. For each element in the table, it pops stack elements (closing the matching divs) until it finds the parent of the current item. Then it outputs the start of that node and pushes it to the stack.
If you want to output the tree using indenting rather than nested elements, you can simply skip the print statements to print the divs, and print a number of spaces equal to some multiple of the size of the stack before each item. For example, in Python:
print " " * len(stack)
You could also easily use this method to construct a set of nested lists or dictionaries.
Edit: I see from your clarification that the names were not intended to be node paths. That suggests an alternate approach:
idx = {}
idx[0] = []
for node in results:
child_list = []
idx[node.Id] = child_list
idx[node.ParentId].append((node, child_list))
This constructs a tree of arrays of tuples(!). idx[0] represents the root(s) of the tree. Each element in an array is a 2-tuple consisting of the node itself and a list of all its children. Once constructed, you can hold on to idx[0] and discard idx, unless you want to access nodes by their ID.
Think about using nosql tools like neo4j for hierarchial structures.
e.g a networked application like linkedin uses couchbase (another nosql solution)
But use nosql only for data-mart level queries and not to store / maintain transactions

Get unknown quantity of parents from child in same table [duplicate]

Assume you have a flat table that stores an ordered tree hierarchy:
Id Name ParentId Order
1 'Node 1' 0 10
2 'Node 1.1' 1 10
3 'Node 2' 0 20
4 'Node 1.1.1' 2 10
5 'Node 2.1' 3 10
6 'Node 1.2' 1 20
Here's a diagram, where we have [id] Name. Root node 0 is fictional.
[0] ROOT
/ \
[1] Node 1 [3] Node 2
/ \ \
[2] Node 1.1 [6] Node 1.2 [5] Node 2.1
/
[4] Node 1.1.1
What minimalistic approach would you use to output that to HTML (or text, for that matter) as a correctly ordered, correctly indented tree?
Assume further you only have basic data structures (arrays and hashmaps), no fancy objects with parent/children references, no ORM, no framework, just your two hands. The table is represented as a result set, which can be accessed randomly.
Pseudo code or plain English is okay, this is purely a conceptional question.
Bonus question: Is there a fundamentally better way to store a tree structure like this in a RDBMS?
EDITS AND ADDITIONS
To answer one commenter's (Mark Bessey's) question: A root node is not necessary, because it is never going to be displayed anyway. ParentId = 0 is the convention to express "these are top level". The Order column defines how nodes with the same parent are going to be sorted.
The "result set" I spoke of can be pictured as an array of hashmaps (to stay in that terminology). For my example was meant to be already there. Some answers go the extra mile and construct it first, but thats okay.
The tree can be arbitrarily deep. Each node can have N children. I did not exactly have a "millions of entries" tree in mind, though.
Don't mistake my choice of node naming ('Node 1.1.1') for something to rely on. The nodes could equally well be called 'Frank' or 'Bob', no naming structure is implied, this was merely to make it readable.
I have posted my own solution so you guys can pull it to pieces.
Now that MySQL 8.0 supports recursive queries, we can say that all popular SQL databases support recursive queries in standard syntax.
WITH RECURSIVE MyTree AS (
SELECT * FROM MyTable WHERE ParentId IS NULL
UNION ALL
SELECT m.* FROM MyTABLE AS m JOIN MyTree AS t ON m.ParentId = t.Id
)
SELECT * FROM MyTree;
I tested recursive queries in MySQL 8.0 in my presentation Recursive Query Throwdown in 2017.
Below is my original answer from 2008:
There are several ways to store tree-structured data in a relational database. What you show in your example uses two methods:
Adjacency List (the "parent" column) and
Path Enumeration (the dotted-numbers in your name column).
Another solution is called Nested Sets, and it can be stored in the same table too. Read "Trees and Hierarchies in SQL for Smarties" by Joe Celko for a lot more information on these designs.
I usually prefer a design called Closure Table (aka "Adjacency Relation") for storing tree-structured data. It requires another table, but then querying trees is pretty easy.
I cover Closure Table in my presentation Models for Hierarchical Data with SQL and PHP and in my book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
CREATE TABLE ClosureTable (
ancestor_id INT NOT NULL REFERENCES FlatTable(id),
descendant_id INT NOT NULL REFERENCES FlatTable(id),
PRIMARY KEY (ancestor_id, descendant_id)
);
Store all paths in the Closure Table, where there is a direct ancestry from one node to another. Include a row for each node to reference itself. For example, using the data set you showed in your question:
INSERT INTO ClosureTable (ancestor_id, descendant_id) VALUES
(1,1), (1,2), (1,4), (1,6),
(2,2), (2,4),
(3,3), (3,5),
(4,4),
(5,5),
(6,6);
Now you can get a tree starting at node 1 like this:
SELECT f.*
FROM FlatTable f
JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1;
The output (in MySQL client) looks like the following:
+----+
| id |
+----+
| 1 |
| 2 |
| 4 |
| 6 |
+----+
In other words, nodes 3 and 5 are excluded, because they're part of a separate hierarchy, not descending from node 1.
Re: comment from e-satis about immediate children (or immediate parent). You can add a "path_length" column to the ClosureTable to make it easier to query specifically for an immediate child or parent (or any other distance).
INSERT INTO ClosureTable (ancestor_id, descendant_id, path_length) VALUES
(1,1,0), (1,2,1), (1,4,2), (1,6,1),
(2,2,0), (2,4,1),
(3,3,0), (3,5,1),
(4,4,0),
(5,5,0),
(6,6,0);
Then you can add a term in your search for querying the immediate children of a given node. These are descendants whose path_length is 1.
SELECT f.*
FROM FlatTable f
JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1
AND path_length = 1;
+----+
| id |
+----+
| 2 |
| 6 |
+----+
Re comment from #ashraf: "How about sorting the whole tree [by name]?"
Here's an example query to return all nodes that are descendants of node 1, join them to the FlatTable that contains other node attributes such as name, and sort by the name.
SELECT f.name
FROM FlatTable f
JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1
ORDER BY f.name;
Re comment from #Nate:
SELECT f.name, GROUP_CONCAT(b.ancestor_id order by b.path_length desc) AS breadcrumbs
FROM FlatTable f
JOIN ClosureTable a ON (f.id = a.descendant_id)
JOIN ClosureTable b ON (b.descendant_id = a.descendant_id)
WHERE a.ancestor_id = 1
GROUP BY a.descendant_id
ORDER BY f.name
+------------+-------------+
| name | breadcrumbs |
+------------+-------------+
| Node 1 | 1 |
| Node 1.1 | 1,2 |
| Node 1.1.1 | 1,2,4 |
| Node 1.2 | 1,6 |
+------------+-------------+
A user suggested an edit today. SO moderators approved the edit, but I am reversing it.
The edit suggested that the ORDER BY in the last query above should be ORDER BY b.path_length, f.name, presumably to make sure the ordering matches the hierarchy. But this doesn't work, because it would order "Node 1.1.1" after "Node 1.2".
If you want the ordering to match the hierarchy in a sensible way, that is possible, but not simply by ordering by the path length. For example, see my answer to MySQL Closure Table hierarchical database - How to pull information out in the correct order.
If you use nested sets (sometimes referred to as Modified Pre-order Tree Traversal) you can extract the entire tree structure or any subtree within it in tree order with a single query, at the cost of inserts being more expensive, as you need to manage columns which describe an in-order path through thee tree structure.
For django-mptt, I used a structure like this:
id parent_id tree_id level lft rght
-- --------- ------- ----- --- ----
1 null 1 0 1 14
2 1 1 1 2 7
3 2 1 2 3 4
4 2 1 2 5 6
5 1 1 1 8 13
6 5 1 2 9 10
7 5 1 2 11 12
Which describes a tree which looks like this (with id representing each item):
1
+-- 2
| +-- 3
| +-- 4
|
+-- 5
+-- 6
+-- 7
Or, as a nested set diagram which makes it more obvious how the lft and rght values work:
__________________________________________________________________________
| Root 1 |
| ________________________________ ________________________________ |
| | Child 1.1 | | Child 1.2 | |
| | ___________ ___________ | | ___________ ___________ | |
| | | C 1.1.1 | | C 1.1.2 | | | | C 1.2.1 | | C 1.2.2 | | |
1 2 3___________4 5___________6 7 8 9___________10 11__________12 13 14
| |________________________________| |________________________________| |
|__________________________________________________________________________|
As you can see, to get the entire subtree for a given node, in tree order, you simply have to select all rows which have lft and rght values between its lft and rght values. It's also simple to retrieve the tree of ancestors for a given node.
The level column is a bit of denormalisation for convenience more than anything and the tree_id column allows you to restart the lft and rght numbering for each top-level node, which reduces the number of columns affected by inserts, moves and deletions, as the lft and rght columns have to be adjusted accordingly when these operations take place in order to create or close gaps. I made some development notes at the time when I was trying to wrap my head around the queries required for each operation.
In terms of actually working with this data to display a tree, I created a tree_item_iterator utility function which, for each node, should give you sufficient information to generate whatever kind of display you want.
More info about MPTT:
Trees in SQL
Storing Hierarchical Data in a Database
Managing Hierarchical Data in MySQL
It's a quite old question, but as it's got many views I think it's worth to present an alternative, and in my opinion very elegant, solution.
In order to read a tree structure you can use recursive Common Table Expressions (CTEs). It gives a possibility to fetch whole tree structure at once, have the information about the level of the node, its parent node and order within children of the parent node.
Let me show you how this would work in PostgreSQL 9.1.
Create a structure
CREATE TABLE tree (
id int NOT NULL,
name varchar(32) NOT NULL,
parent_id int NULL,
node_order int NOT NULL,
CONSTRAINT tree_pk PRIMARY KEY (id),
CONSTRAINT tree_tree_fk FOREIGN KEY (parent_id)
REFERENCES tree (id) NOT DEFERRABLE
);
insert into tree values
(0, 'ROOT', NULL, 0),
(1, 'Node 1', 0, 10),
(2, 'Node 1.1', 1, 10),
(3, 'Node 2', 0, 20),
(4, 'Node 1.1.1', 2, 10),
(5, 'Node 2.1', 3, 10),
(6, 'Node 1.2', 1, 20);
Write a query
WITH RECURSIVE
tree_search (id, name, level, parent_id, node_order) AS (
SELECT
id,
name,
0,
parent_id,
1
FROM tree
WHERE parent_id is NULL
UNION ALL
SELECT
t.id,
t.name,
ts.level + 1,
ts.id,
t.node_order
FROM tree t, tree_search ts
WHERE t.parent_id = ts.id
)
SELECT * FROM tree_search
WHERE level > 0
ORDER BY level, parent_id, node_order;
Here are the results:
id | name | level | parent_id | node_order
----+------------+-------+-----------+------------
1 | Node 1 | 1 | 0 | 10
3 | Node 2 | 1 | 0 | 20
2 | Node 1.1 | 2 | 1 | 10
6 | Node 1.2 | 2 | 1 | 20
5 | Node 2.1 | 2 | 3 | 10
4 | Node 1.1.1 | 3 | 2 | 10
(6 rows)
The tree nodes are ordered by a level of depth. In the final output we would present them in the subsequent lines.
For each level, they are ordered by parent_id and node_order within the parent. This tells us how to present them in the output - link node to the parent in this order.
Having such a structure it wouldn't be difficult to make a really nice presentation in HTML.
Recursive CTEs are available in PostgreSQL, IBM DB2, MS SQL Server, Oracle and SQLite.
If you'd like to read more on recursive SQL queries, you can either check the documentation of your favourite DBMS or read my two articles covering this topic:
Do It In SQL: Recursive Tree Traversal
Get to know the power of SQL recursive queries
As of Oracle 9i, you can use CONNECT BY.
SELECT LPAD(' ', (LEVEL - 1) * 4) || "Name" AS "Name"
FROM (SELECT * FROM TMP_NODE ORDER BY "Order")
CONNECT BY PRIOR "Id" = "ParentId"
START WITH "Id" IN (SELECT "Id" FROM TMP_NODE WHERE "ParentId" = 0)
As of SQL Server 2005, you can use a recursive common table expression (CTE).
WITH [NodeList] (
[Id]
, [ParentId]
, [Level]
, [Order]
) AS (
SELECT [Node].[Id]
, [Node].[ParentId]
, 0 AS [Level]
, CONVERT([varchar](MAX), [Node].[Order]) AS [Order]
FROM [Node]
WHERE [Node].[ParentId] = 0
UNION ALL
SELECT [Node].[Id]
, [Node].[ParentId]
, [NodeList].[Level] + 1 AS [Level]
, [NodeList].[Order] + '|'
+ CONVERT([varchar](MAX), [Node].[Order]) AS [Order]
FROM [Node]
INNER JOIN [NodeList] ON [NodeList].[Id] = [Node].[ParentId]
) SELECT REPLICATE(' ', [NodeList].[Level] * 4) + [Node].[Name] AS [Name]
FROM [Node]
INNER JOIN [NodeList] ON [NodeList].[Id] = [Node].[Id]
ORDER BY [NodeList].[Order]
Both will output the following results.
Name
'Node 1'
' Node 1.1'
' Node 1.1.1'
' Node 1.2'
'Node 2'
' Node 2.1'
Bill's answer is pretty gosh-darned good, this answer adds some things to it which makes me wish SO supported threaded answers.
Anyway I wanted to support the tree structure and the Order property. I included a single property in each Node called leftSibling that does the same thing Order is meant to do in the original question (maintain left-to-right order).
mysql> desc nodes ;
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(255) | YES | | NULL | |
| leftSibling | int(11) | NO | | 0 | |
+-------------+--------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
mysql> desc adjacencies;
+------------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------+------+-----+---------+----------------+
| relationId | int(11) | NO | PRI | NULL | auto_increment |
| parent | int(11) | NO | | NULL | |
| child | int(11) | NO | | NULL | |
| pathLen | int(11) | NO | | NULL | |
+------------+---------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
More detail and SQL code on my blog.
Thanks Bill your answer was helpful in getting started!
There are really good solutions which exploit the internal btree representation of sql indices. This is based on some great research done back around 1998.
Here is an example table (in mysql).
CREATE TABLE `node` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`tw` int(10) unsigned NOT NULL,
`pa` int(10) unsigned DEFAULT NULL,
`sz` int(10) unsigned DEFAULT NULL,
`nc` int(11) GENERATED ALWAYS AS (tw+sz) STORED,
PRIMARY KEY (`id`),
KEY `node_tw_index` (`tw`),
KEY `node_pa_index` (`pa`),
KEY `node_nc_index` (`nc`),
CONSTRAINT `node_pa_fk` FOREIGN KEY (`pa`) REFERENCES `node` (`tw`) ON DELETE CASCADE
)
The only fields necessary for the tree representation are:
tw: The Left to Right DFS Pre-order index, where root = 1.
pa: The reference (using tw) to the parent node, root has null.
sz: The size of the node's branch including itself.
nc: is used as syntactic sugar. it is tw+sz and represents the tw of the node's "next child".
Here is an example 24 node population, ordered by tw:
+-----+---------+----+------+------+------+
| id | name | tw | pa | sz | nc |
+-----+---------+----+------+------+------+
| 1 | Root | 1 | NULL | 24 | 25 |
| 2 | A | 2 | 1 | 14 | 16 |
| 3 | AA | 3 | 2 | 1 | 4 |
| 4 | AB | 4 | 2 | 7 | 11 |
| 5 | ABA | 5 | 4 | 1 | 6 |
| 6 | ABB | 6 | 4 | 3 | 9 |
| 7 | ABBA | 7 | 6 | 1 | 8 |
| 8 | ABBB | 8 | 6 | 1 | 9 |
| 9 | ABC | 9 | 4 | 2 | 11 |
| 10 | ABCD | 10 | 9 | 1 | 11 |
| 11 | AC | 11 | 2 | 4 | 15 |
| 12 | ACA | 12 | 11 | 2 | 14 |
| 13 | ACAA | 13 | 12 | 1 | 14 |
| 14 | ACB | 14 | 11 | 1 | 15 |
| 15 | AD | 15 | 2 | 1 | 16 |
| 16 | B | 16 | 1 | 1 | 17 |
| 17 | C | 17 | 1 | 6 | 23 |
| 359 | C0 | 18 | 17 | 5 | 23 |
| 360 | C1 | 19 | 18 | 4 | 23 |
| 361 | C2(res) | 20 | 19 | 3 | 23 |
| 362 | C3 | 21 | 20 | 2 | 23 |
| 363 | C4 | 22 | 21 | 1 | 23 |
| 18 | D | 23 | 1 | 1 | 24 |
| 19 | E | 24 | 1 | 1 | 25 |
+-----+---------+----+------+------+------+
Every tree result can be done non-recursively.
For instance, to get a list of ancestors of node at tw='22'
Ancestors
select anc.* from node me,node anc
where me.tw=22 and anc.nc >= me.tw and anc.tw <= me.tw
order by anc.tw;
+-----+---------+----+------+------+------+
| id | name | tw | pa | sz | nc |
+-----+---------+----+------+------+------+
| 1 | Root | 1 | NULL | 24 | 25 |
| 17 | C | 17 | 1 | 6 | 23 |
| 359 | C0 | 18 | 17 | 5 | 23 |
| 360 | C1 | 19 | 18 | 4 | 23 |
| 361 | C2(res) | 20 | 19 | 3 | 23 |
| 362 | C3 | 21 | 20 | 2 | 23 |
| 363 | C4 | 22 | 21 | 1 | 23 |
+-----+---------+----+------+------+------+
Siblings and children are trivial - just use pa field ordering by tw.
Descendants
For example the set (branch) of nodes that are rooted at tw = 17.
select des.* from node me,node des
where me.tw=17 and des.tw < me.nc and des.tw >= me.tw
order by des.tw;
+-----+---------+----+------+------+------+
| id | name | tw | pa | sz | nc |
+-----+---------+----+------+------+------+
| 17 | C | 17 | 1 | 6 | 23 |
| 359 | C0 | 18 | 17 | 5 | 23 |
| 360 | C1 | 19 | 18 | 4 | 23 |
| 361 | C2(res) | 20 | 19 | 3 | 23 |
| 362 | C3 | 21 | 20 | 2 | 23 |
| 363 | C4 | 22 | 21 | 1 | 23 |
+-----+---------+----+------+------+------+
Additional Notes
This methodology is extremely useful for when there are a far greater number of reads than there are inserts or updates.
Because the insertion, movement, or updating of a node in the tree requires the tree to be adjusted, it is necessary to lock the table before commencing with the action.
The insertion/deletion cost is high because the tw index and sz (branch size) values will need to be updated on all the nodes after the insertion point, and for all ancestors respectively.
Branch moving involves moving the tw value of the branch out of range, so it is also necessary to disable foreign key constraints when moving a branch. There are, essentially four queries required to move a branch:
Move the branch out of range.
Close the gap that it left. (the remaining tree is now normalised).
Open the gap where it will go to.
Move the branch into it's new position.
Adjust Tree Queries
The opening/closing of gaps in the tree is an important sub-function used by create/update/delete methods, so I include it here.
We need two parameters - a flag representing whether or not we are downsizing or upsizing, and the node's tw index. So, for example tw=18 (which has a branch size of 5). Let's assume that we are downsizing (removing tw) - this means that we are using '-' instead of '+' in the updates of the following example.
We first use a (slightly altered) ancestor function to update the sz value.
update node me, node anc set anc.sz = anc.sz - me.sz from
node me, node anc where me.tw=18
and ((anc.nc >= me.tw and anc.tw < me.pa) or (anc.tw=me.pa));
Then we need to adjust the tw for those whose tw is higher than the branch to be removed.
update node me, node anc set anc.tw = anc.tw - me.sz from
node me, node anc where me.tw=18 and anc.tw >= me.tw;
Then we need to adjust the parent for those whose pa's tw is higher than the branch to be removed.
update node me, node anc set anc.pa = anc.pa - me.sz from
node me, node anc where me.tw=18 and anc.pa >= me.tw;
Well given the choice, I'd be using objects. I'd create an object for each record where each object has a children collection and store them all in an assoc array (/hashtable) where the Id is the key. And blitz through the collection once, adding the children to the relevant children fields. Simple.
But because you're being no fun by restricting use of some good OOP, I'd probably iterate based on:
function PrintLine(int pID, int level)
foreach record where ParentID == pID
print level*tabs + record-data
PrintLine(record.ID, level + 1)
PrintLine(0, 0)
Edit: this is similar to a couple of other entries, but I think it's slightly cleaner. One thing I'll add: this is extremely SQL-intensive. It's nasty. If you have the choice, go the OOP route.
This was written quickly, and is neither pretty nor efficient (plus it autoboxes alot, converting between int and Integer is annoying!), but it works.
It probably breaks the rules since I'm creating my own objects but hey I'm doing this as a diversion from real work :)
This also assumes that the resultSet/table is completely read into some sort of structure before you start building Nodes, which wouldn't be the best solution if you have hundreds of thousands of rows.
public class Node {
private Node parent = null;
private List<Node> children;
private String name;
private int id = -1;
public Node(Node parent, int id, String name) {
this.parent = parent;
this.children = new ArrayList<Node>();
this.name = name;
this.id = id;
}
public int getId() {
return this.id;
}
public String getName() {
return this.name;
}
public void addChild(Node child) {
children.add(child);
}
public List<Node> getChildren() {
return children;
}
public boolean isRoot() {
return (this.parent == null);
}
#Override
public String toString() {
return "id=" + id + ", name=" + name + ", parent=" + parent;
}
}
public class NodeBuilder {
public static Node build(List<Map<String, String>> input) {
// maps id of a node to it's Node object
Map<Integer, Node> nodeMap = new HashMap<Integer, Node>();
// maps id of a node to the id of it's parent
Map<Integer, Integer> childParentMap = new HashMap<Integer, Integer>();
// create special 'root' Node with id=0
Node root = new Node(null, 0, "root");
nodeMap.put(root.getId(), root);
// iterate thru the input
for (Map<String, String> map : input) {
// expect each Map to have keys for "id", "name", "parent" ... a
// real implementation would read from a SQL object or resultset
int id = Integer.parseInt(map.get("id"));
String name = map.get("name");
int parent = Integer.parseInt(map.get("parent"));
Node node = new Node(null, id, name);
nodeMap.put(id, node);
childParentMap.put(id, parent);
}
// now that each Node is created, setup the child-parent relationships
for (Map.Entry<Integer, Integer> entry : childParentMap.entrySet()) {
int nodeId = entry.getKey();
int parentId = entry.getValue();
Node child = nodeMap.get(nodeId);
Node parent = nodeMap.get(parentId);
parent.addChild(child);
}
return root;
}
}
public class NodePrinter {
static void printRootNode(Node root) {
printNodes(root, 0);
}
static void printNodes(Node node, int indentLevel) {
printNode(node, indentLevel);
// recurse
for (Node child : node.getChildren()) {
printNodes(child, indentLevel + 1);
}
}
static void printNode(Node node, int indentLevel) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < indentLevel; i++) {
sb.append("\t");
}
sb.append(node);
System.out.println(sb.toString());
}
public static void main(String[] args) {
// setup dummy data
List<Map<String, String>> resultSet = new ArrayList<Map<String, String>>();
resultSet.add(newMap("1", "Node 1", "0"));
resultSet.add(newMap("2", "Node 1.1", "1"));
resultSet.add(newMap("3", "Node 2", "0"));
resultSet.add(newMap("4", "Node 1.1.1", "2"));
resultSet.add(newMap("5", "Node 2.1", "3"));
resultSet.add(newMap("6", "Node 1.2", "1"));
Node root = NodeBuilder.build(resultSet);
printRootNode(root);
}
//convenience method for creating our dummy data
private static Map<String, String> newMap(String id, String name, String parentId) {
Map<String, String> row = new HashMap<String, String>();
row.put("id", id);
row.put("name", name);
row.put("parent", parentId);
return row;
}
}
Assuming that you know that the root elements are zero, here's the pseudocode to output to text:
function PrintLevel (int curr, int level)
//print the indents
for (i=1; i<=level; i++)
print a tab
print curr \n;
for each child in the table with a parent of curr
PrintLevel (child, level+1)
for each elementID where the parentid is zero
PrintLevel(elementID, 0)
You can emulate any other data structure with a hashmap, so that's not a terrible limitation. Scanning from the top to the bottom, you create a hashmap for each row of the database, with an entry for each column. Add each of these hashmaps to a "master" hashmap, keyed on the id. If any node has a "parent" that you haven't seen yet, create an placeholder entry for it in the master hashmap, and fill it in when you see the actual node.
To print it out, do a simple depth-first pass through the data, keeping track of indent level along the way. You can make this easier by keeping a "children" entry for each row, and populating it as you scan the data.
As for whether there's a "better" way to store a tree in a database, that depends on how you're going to use the data. I've seen systems that had a known maximum depth that used a different table for each level in the hierarchy. That makes a lot of sense if the levels in the tree aren't quite equivalent after all (top level categories being different than the leaves).
If nested hash maps or arrays can be created, then I can simply go down the table from the beginning and add each item to the nested array. I must trace each line to the root node in order to know which level in the nested array to insert into. I can employ memoization so that I do not need to look up the same parent over and over again.
Edit: I would read the entire table into an array first, so it will not query the DB repeatedly. Of course this won't be practical if your table is very large.
After the structure is built, I must do a depth first traverse through it and print out the HTML.
There's no better fundamental way to store this information using one table (I could be wrong though ;), and would love to see a better solution ). However, if you create a scheme to employ dynamically created db tables, then you opened up a whole new world at the sacrifice of simplicity, and the risk of SQL hell ;).
To Extend Bill's SQL solution you can basically do the same using a flat array. Further more if your strings all have the same lenght and your maximum number of children are known (say in a binary tree) you can do it using a single string (character array). If you have arbitrary number of children this complicates things a bit... I would have to check my old notes to see what can be done.
Then, sacrificing a bit of memory, especially if your tree is sparse and/or unballanced, you can, with a bit of index math, access all the strings randomly by storing your tree, width first in the array like so (for a binary tree):
String[] nodeArray = [L0root, L1child1, L1child2, L2Child1, L2Child2, L2Child3, L2Child4] ...
yo know your string length, you know it
I'm at work now so cannot spend much time on it but with interest I can fetch a bit of code to do this.
We use to do it to search in binary trees made of DNA codons, a process built the tree, then we flattened it to search text patterns and when found, though index math (revers from above) we get the node back... very fast and efficient, tough our tree rarely had empty nodes, but we could searh gigabytes of data in a jiffy.
Pre-order transversal with on-the-fly path enumeration on adjacency representation
Nested sets from:
Konchog https://stackoverflow.com/a/42781302/895245
Jonny Buchanan https://stackoverflow.com/a/194031/895245
is the only efficient way I've seen of traversing, at the cost of slower updates. That's likely what most people will want for pre-order.
Closure table from https://stackoverflow.com/a/192462/895245 is interesting, but I don't see how to enforce pre-order there: MySQL Closure Table hierarchical database - How to pull information out in the correct order
Mostly for fun, here's a method that recursively calculates the 1.3.2.5. prefixes on the fly and sorts by them at the end, based only on the parent ID/child index representation.
Upsides:
updates only need to update the indexes of each sibling
Downsides:
n^2 memory usage worst case for a super deep tree. This could be quite serious, which is why I say this method is likely mostly for fun only. But maybe there is some ultra high update case where someone would want to use it? Who knows
recursive queries, so reads are going to be less efficient than nested sets
Create and populate table:
CREATE TABLE "ParentIndexTree" (
"id" INTEGER PRIMARY KEY,
"parentId" INTEGER,
"childIndex" INTEGER NOT NULL,
"value" INTEGER NOT NULL,
"name" TEXT NOT NULL,
FOREIGN KEY ("parentId") REFERENCES "ParentIndexTree"(id)
)
;
INSERT INTO "ParentIndexTree" VALUES
(0, NULL, 0, 1, 'one' ),
(1, 0, 0, 2, 'two' ),
(2, 0, 1, 3, 'three'),
(3, 1, 0, 4, 'four' ),
(4, 1, 1, 5, 'five' )
;
Represented tree:
1
/ \
2 3
/ \
4 5
Then for a DBMS with arrays like PostgreSQL](https://www.postgresql.org/docs/14/arrays.html):
WITH RECURSIVE "TreeSearch" (
"id",
"parentId",
"childIndex",
"value",
"name",
"prefix"
) AS (
SELECT
"id",
"parentId",
"childIndex",
"value",
"name",
array[0]
FROM "ParentIndexTree"
WHERE "parentId" IS NULL
UNION ALL
SELECT
"child"."id",
"child"."parentId",
"child"."childIndex",
"child"."value",
"child"."name",
array_append("parent"."prefix", "child"."childIndex")
FROM "ParentIndexTree" AS "child"
JOIN "TreeSearch" AS "parent"
ON "child"."parentId" = "parent"."id"
)
SELECT * FROM "TreeSearch"
ORDER BY "prefix"
;
This creates on the fly prefixes of form:
1 -> 0
2 -> 0, 0
3 -> 0, 1
4 -> 0, 0, 0
5 -> 0, 0, 1
and then PostgreSQL then sorts by the arrays alphabetically as:
1 -> 0
2 -> 0, 0
4 -> 0, 0, 0
5 -> 0, 0, 1
3 -> 0, 1
which is the pre-order result that we want.
For a DBMS without arrays like SQLite, you can hack by encoding the prefix with a string of fixed width integers. Binary would be ideal, but I couldn't find out how, so hex would work. This of course means you will have to select a maximum depth that will fit in the number of bytes selected, e.g. below I choose 6 allowing for a maximum of 16^6 children per node.
WITH RECURSIVE "TreeSearch" (
"id",
"parentId",
"childIndex",
"value",
"name",
"prefix"
) AS (
SELECT
"id",
"parentId",
"childIndex",
"value",
"name",
'000000'
FROM "ParentIndexTree"
WHERE "parentId" IS NULL
UNION ALL
SELECT
"child"."id",
"child"."parentId",
"child"."childIndex",
"child"."value",
"child"."name",
"parent"."prefix" || printf('%06x', "child"."childIndex")
FROM "ParentIndexTree" AS "child"
JOIN "TreeSearch" AS "parent"
ON "child"."parentId" = "parent"."id"
)
SELECT * FROM "TreeSearch"
ORDER BY "prefix"
;
Some nested set notes
Here are a few points which confused me a bit after looking at the other nested set answers.
Jonny Buchanan shows his nested set setup as:
__________________________________________________________________________
| Root 1 |
| ________________________________ ________________________________ |
| | Child 1.1 | | Child 1.2 | |
| | ___________ ___________ | | ___________ ___________ | |
| | | C 1.1.1 | | C 1.1.2 | | | | C 1.2.1 | | C 1.2.2 | | |
1 2 3___________4 5___________6 7 8 9___________10 11__________12 13 14
| |________________________________| |________________________________| |
|__________________________________________________________________________|
which made me wonder why not just use the simpler looking:
__________________________________________________________________________
| Root 1 |
| ________________________________ _______________________________ |
| | Child 1.1 | | Child 1.2 | |
| | ___________ ___________ | | ___________ ___________ | |
| | | C 1.1.1 | | C 1.1.2 | | | | C 1.2.1 | | C 1.2.2 | | |
1 2 3___________| 4___________| | 5 6___________| 7___________| | |
| |________________________________| |_______________________________| |
|_________________________________________________________________________|
which does not have an extra number for each endpoint.
But then when I actually tried to implement it, I noticed that it was hard/impossible to implement the update queries like that, unless I had parent information as used by Konchog. The problem is that it was hard/impossible to distinguish between a sibling and a parent in one case while the tree was being moved around, and I needed that to decide if I was going to reduce the right hand side or not while closing a gap.
Left/size vs left/right: you could store it either way in the database, but I think left/right can be more efficient as you can index the DB with a multicolumn index (left, right) which can then be used to speed up ancestor queries, which are of type:
left < curLeft AND right > curLeft
Tested on Ubuntu 22.04, PostgreSQL 14.5, SQLite 3.34.0.
If elements are in tree order, as shown in your example, you can use something like the following Python example:
delimiter = '.'
stack = []
for item in items:
while stack and not item.startswith(stack[-1]+delimiter):
print "</div>"
stack.pop()
print "<div>"
print item
stack.append(item)
What this does is maintain a stack representing the current position in the tree. For each element in the table, it pops stack elements (closing the matching divs) until it finds the parent of the current item. Then it outputs the start of that node and pushes it to the stack.
If you want to output the tree using indenting rather than nested elements, you can simply skip the print statements to print the divs, and print a number of spaces equal to some multiple of the size of the stack before each item. For example, in Python:
print " " * len(stack)
You could also easily use this method to construct a set of nested lists or dictionaries.
Edit: I see from your clarification that the names were not intended to be node paths. That suggests an alternate approach:
idx = {}
idx[0] = []
for node in results:
child_list = []
idx[node.Id] = child_list
idx[node.ParentId].append((node, child_list))
This constructs a tree of arrays of tuples(!). idx[0] represents the root(s) of the tree. Each element in an array is a 2-tuple consisting of the node itself and a list of all its children. Once constructed, you can hold on to idx[0] and discard idx, unless you want to access nodes by their ID.
Think about using nosql tools like neo4j for hierarchial structures.
e.g a networked application like linkedin uses couchbase (another nosql solution)
But use nosql only for data-mart level queries and not to store / maintain transactions

MySQL many-many JSON aggregation merging duplicate keys

I'm having trouble returning a JSON representation of a many-many join. My plan was to encode the columns returned using the following JSON format
{
"dog": [
"duke"
],
"location": [
"home",
"scotland"
]
}
This format would handle duplicate keys by aggregating the results in a JSON array, howver all of my attempts at aggregating this structure so far have just removed duplicates, so the arrays only ever have a single element.
Tables
Here is a simplified table structure I've made for the purposes of explaining this query.
media
| media_id | sha256 | filepath |
| 1 | 33327AD02AD09523C66668C7674748701104CE7A9976BC3ED8BA836C74443DBC | /photos/cat.jpeg |
| 2 | 323b5e69e72ba980cd4accbdbb59c5061f28acc7c0963fee893c9a40db929070 | /photos/dog.jpeg |
| 3 | B986620404660DCA7B3DEC4EFB2DE80C0548AB0DE243B6D59DA445DE2841E474 | /photos/dog2.jpeg |
| 4 | 1be439dd87cd87087a425c760d6d8edc484f126b5447beb2203d21e09e2a8f11 | /photos/balloon.jpeg |
media_metdata_labels_has_media (for many-many joins)
| media_metadata_labels_label_id | media_media_id |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 1 | 2 |
| 4 | 2 |
| 5 | 2 |
| 1 | 3 |
| 6 | 3 |
| 7 | 3 |
| 8 | 4 |
| 9 | 4 |
media_metadata_labels
| label_id | label_key | label_value |
| 2 | cat | lily |
| 4 | dog | duke |
| 6 | dog | rex |
| 1 | pet size | small |
| 3 | location | home |
| 7 | location | park |
| 8 | location | scotland |
| 9 | location | sky |
| 5 | location | studio |
My current attempt
My latest attempt at querying this data uses JSON_MERGE_PRESERVE with two arguments, the first is just an empty JSON object and the second is an invalid JSON document. It's invalid because there are duplicate keys, but I was hoping that JSON_MERGE_PRESERVE would merge them. It turns out JSON_MERGE_PRESERVE will only merge duplicates if they're not in the same JSON argument.
For example, this won't merge two keys
SET #key_one = '{}';
SET #key_two = '{"location": ["home"], "location": ["scotland"]}';
SELECT JSON_MERGE_PRESERVE(#key_one, #key_two);
-- returns {"location": ["scotland"]}
but this will
SET #key_one = '{"location": ["home"] }';
SET #key_two = '{"location": ["scotland"]}';
SELECT JSON_MERGE_PRESERVE(#key_one, #key_two);
-- returns {"location": ["home", "scotland"]}
So anyway, here's my current attempt
SELECT
m.media_id,
m.filepath,
JSON_MERGE_PRESERVE(
'{}',
CAST(
CONCAT(
'{',
GROUP_CONCAT(CONCAT('"', l.label_key, '":["', l.label_value, '"]')),
'}'
)
AS JSON)
)
as labels
FROM media AS m
LEFT JOIN media_metadata_labels_has_media AS lm ON lm.media_media_id = m.media_id
LEFT JOIN media_metadata_labels AS l ON l.label_id = lm.media_metadata_labels_label_id
GROUP BY m.media_id, m.filepath
-- HAVING JSON_CONTAINS(labels, '"location"', CONCAT('$.', '"home"')); -- this would let me filter on labels one they're in the correct JSON format
After trying different combinations of JSON_MERGE, JSON_OBJECTAGG, JSON_ARRAYAGG, CONCAT and GROUP_CONCAT this still leaves me scratching my head.
Disclaimer: Since posting this question I've started using mariadb instead of oracle MySQL. The function below should work for MySQL too, but in case it doesn't then any changes required will likely be small syntax fixes.
I solved this by creating a custom aggregation function
DELIMITER //
CREATE AGGREGATE FUNCTION JSON_LABELAGG (
json_key TEXT,
json_value TEXT
) RETURNS JSON
BEGIN
DECLARE complete_json JSON DEFAULT '{}';
DECLARE current_jsonpath TEXT;
DECLARE current_jsonpath_value_type TEXT;
DECLARE current_jsonpath_value JSON;
DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN complete_json;
main_loop: LOOP
FETCH GROUP NEXT ROW;
SET current_jsonpath = CONCAT('$.', json_key); -- the jsonpath to our json_key
SET current_jsonpath_value_type = JSON_TYPE(JSON_EXTRACT(complete_json, current_jsonpath)); -- the json object type at the current path
SET current_jsonpath_value = JSON_QUERY(complete_json, current_jsonpath); -- the json value at the current path
-- if this is the first label value with this key then place it in a new array
IF (ISNULL(current_jsonpath_value_type)) THEN
SET complete_json = JSON_INSERT(complete_json, current_jsonpath, JSON_ARRAY(json_value));
ITERATE main_loop;
END IF;
-- confirm that an array is at this jsonpath, otherwise that's an exception
CASE current_jsonpath_value_type
WHEN 'ARRAY' THEN
-- check if our json_value is already within the array and don't push a duplicate if it is
IF (ISNULL(JSON_SEARCH(JSON_EXTRACT(complete_json, current_jsonpath), "one", json_value))) THEN
SET complete_json = JSON_ARRAY_APPEND(complete_json, current_jsonpath, json_value);
END IF;
ITERATE main_loop;
ELSE
SIGNAL SQLSTATE '45000'
SET MESSAGE_TEXT = 'Expected JSON label object to be an array';
END CASE;
END LOOP;
RETURN complete_json;
END //
DELIMITER ;
and editing my query to use it
SELECT
m.media_id,
m.filepath,
JSON_LABELAGG(l.label_key, l.label_value) as labels
FROM media AS m
LEFT JOIN media_metadata_labels_has_media AS lm ON lm.media_media_id = m.media_id
LEFT JOIN media_metadata_labels AS l ON l.label_id = lm.media_metadata_labels_label_id
GROUP BY m.media_id, m.filepath

Missing an option to use SQL's HAVING clause in F3

Is there a way to use MySQL's HAVING clause with any of Fat Free Framework's SQL Mapper object's methods? Let's assume I have the following DB table:
+----+-------+--------+
| id | score | weight |
+----+-------+--------+
| 2 | 1 | 1 |
| 2 | 2 | 3 |
| 2 | 3 | 1 |
| 2 | 2 | 2 |
| 3 | 1 | 4 |
| 3 | 3 | 1 |
| 3 | 4 | 3 |
+----+-------+--------+
Now I would like to run a following query:
SELECT id, SUM(score*weight)/SUM(weight) AS weighted_score GROUP BY id HAVING weighted_score>2
Truth to be told I would actually like to count the number of these records, but a count method doesn't support $options.
I can run the query without a HAVING clause and then loop through them to check weighted_score against the value, but with a growing number of records will make it more and more resource consuming. Is there any built-in solution to solve this problem?
EDIT 1:
The way I know how to do it if there is no support for the HAVING clause (based on manual):
$databaseObject = new DB\SQL(...);
$dataMapper = new \DB\SQL\Mapper($databaseObject, "tableName");
$dataMapper->weightedScore = "SUM(weight*score)/SUM(weight)";
$usersInfo = $dataMapper->find([],["group"=>"id"]);
$place = 1;
foreach ( $usersInfo as $userInfo ) {
if ( $usersScores->weightedScore > 2) $place++;
}
If I were able to use HAVING clause then the foreach loop would not be needed and the number of items loaded by a query would be reduced:
$databaseObject = new DB\SQL(...);
$dataMapper = new \DB\SQL\Mapper($databaseObject, "tableName");
$dataMapper->weightedScore = "SUM(weight*score)/SUM(weight)";
$usersInfo = $dataMapper->find([],["group"=>"id", "having"=>"weighted_score<2"]); // rough idea
$place = count($usersInfo);
And if count method supported $options it would be even simpler and it would save memory used by the app as no records would be loaded:
$databaseObject = new DB\SQL(...);
$dataMapper = new \DB\SQL\Mapper($databaseObject, "tableName");
$dataMapper->weightedScore = "SUM(weight*score)/SUM(weight)";
$place = $dataMapper->count([],["group"=>"id", "having"=>"weighted_score<2"]); // rough idea
Use Sub Query.
select count (0) from (SELECT id, SUM(score*weight)/SUM(weight) AS weighted_score GROUP BY id) where weighted_score>2;
Hope it will help.
As far as I know, you can put the HAVING clause into the group option:
$usersInfo = $dataMapper->find([],["group"=>"id HAVING weighted_score<2"]);
Another way could be to create a VIEW in mysql and filter the records on a virtual fields in that view.

Search contacts upto multiple levels [duplicate]

I have a database with a tree of names that can go down a total of 9 levels deep and I need to be able to search down a signal branch of the tree from any point on the branch.
Database:
+----------------------+
| id | name | parent |
+----------------------+
| 1 | tom | 0 |
| 2 | bob | 0 |
| 3 | fred | 1 |
| 4 | tim | 2 |
| 5 | leo | 4 |
| 6 | sam | 4 |
| 7 | joe | 6 |
| 8 | jay | 3 |
| 9 | jim | 5 |
+----------------------+
Tree:
tom
fred
jay
bob
tim
sam
joe
leo
jim
For example:
If I search "j" from the user "bob" I should get only "joe" and "jim". If I search "j" form "leo" I should only get "jim".
I can't think of any easy way do to this so any help is appreciated.
You should really consider using the Modified Preorder Tree Traversal which makes such queries much easier. Here's your table expressed with MPTT. I have left the parent field, as it makes some queries easier.
+----------------------+-----+------+
| id | name | parent | lft | rght |
+----------------------+-----+------+
| 1 | tom | 0 | 1 | 6 |
| 2 | bob | 0 | 7 | 18 |
| 3 | fred | 1 | 2 | 5 |
| 4 | tim | 2 | 8 | 17 |
| 5 | leo | 4 | 12 | 15 |
| 6 | sam | 4 | 9 | 16 |
| 7 | joe | 6 | 10 | 11 |
| 8 | jay | 3 | 3 | 4 |
| 9 | jim | 5 | 13 | 14 |
+----------------------+-----+------+
To search j from user bob you'd use the lft and rght values for bob:
SELECT * FROM table WHERE name LIKE 'j%' AND lft > 7 AND rght < 18
Implementing the logic to update lft and rght for adding, removing and reordering nodes can be a challenge (hint: use an existing library if you can) but querying will be a breeze.
There isn't a nice/easy way of doing this; databases don't support tree-style data structures well.
You will need to work on a level-by-level basis to prune results from child-to-parent, or create a view that gives all 9 generations from a given node, and match using an OR on the descendants.
Have you thought about using a recursive loop? i use a loop for a cms i built on top of codeigniter that allows me to start anywhere in the site tree and will then subsequently filter trhough all the children> grand children > great grand children etc. Plus it keeps the sql down to short rapid queries opposed to lots of complicated joins. It may need some modifying in your case but i think it could work.
/**
* build_site_tree
*
* #return void
* #author Mike Waites
**/
public function build_site_tree($parent_id)
{
return $this->find_children($parent_id);
}
/** end build_site_tree **/
// -----------------------------------------------------------------------
/**
* find_children
* Recursive loop to find parent=>child relationships
*
* #return array $children
* #author Mike Waites
**/
public function find_children($parent_id)
{
$this->benchmark->mark('find_children_start');
if(!class_exists('Account_model'))
$this->load->model('Account_model');
$children = $this->Account_model->get_children($parent_id);
/** Recursively Loop over the results to build the site tree **/
foreach($children as $key => $child)
{
$childs = $this->find_children($child['id']);
if (count($childs) > 0)
$children[$key]['children'] = $childs;
}
return $children;
$this->benchmark->mark('find_children_end');
}
/** end find_children **/
As you can see this is a pretty simplfied version and bear in mind this has been built into codeigniter so you will need to modyfy it to suite but basically we have a loop that calls itself adding to an array each time as it goes. This will allow you to get the whole tree, or even start from a point in the tree as long as you have the parent_id avaialble first!
Hope this helps
The new "recursive with" construct will do the job, but I don't know id MySQL supports it (yet).
with recursive bobs(id) as (
select id from t where name = 'bob'
union all
select t.id from t, bobs where t.parent_id = bobs.id
)
select t.name from t, bobs where t.id = bobs.id
and name like 'j%'
There is no single SQL query that will return the data in tree format - you need processing to traverse it in the right order.
One way is to query MySQL to return MPTT:
SELECT * FROM table ORDER BY parent asc;
root of the tree will be the first item of the table, its children will be next, etc., the tree being listed "breadth first" (in layers of increasing depth)
Then use PHP to process the data, turning it into an object that holds the data structure.
Alternatively, you could implement MySQL search functions that given a node, recursively search and return a table of all its descendants, or a table of all its ancestors. As these procedures tend to be slow (being recursive, returning too much data that is then filtered by other criteria), you want to only do this if you know you're not querying for that kind of data again and again, or if you know that the data set remains small (9 levels deep and how wide?)
You can do this with a stored procedure as follows:
Example calls
mysql> call names_hier(1, 'a');
+----+----------+--------+-------------+-------+
| id | emp_name | parent | parent_name | depth |
+----+----------+--------+-------------+-------+
| 2 | ali | 1 | f00 | 1 |
| 8 | anna | 6 | keira | 4 |
+----+----------+--------+-------------+-------+
2 rows in set (0.00 sec)
mysql> call names_hier(3, 'k');
+----+----------+--------+-------------+-------+
| id | emp_name | parent | parent_name | depth |
+----+----------+--------+-------------+-------+
| 6 | keira | 5 | eva | 2 |
+----+----------+--------+-------------+-------+
1 row in set (0.00 sec)
$sqlCmd = sprintf("call names_hier(%d,'%s')", $id, $name); // dont forget to escape $name
$result = $db->query($sqlCmd);
Full script
drop table if exists names;
create table names
(
id smallint unsigned not null auto_increment primary key,
name varchar(255) not null,
parent smallint unsigned null,
key (parent)
)
engine = innodb;
insert into names (name, parent) values
('f00',null),
('ali',1),
('megan',1),
('jessica',3),
('eva',3),
('keira',5),
('mandy',6),
('anna',6);
drop procedure if exists names_hier;
delimiter #
create procedure names_hier
(
in p_id smallint unsigned,
in p_name varchar(255)
)
begin
declare v_done tinyint unsigned default(0);
declare v_dpth smallint unsigned default(0);
set p_name = trim(replace(p_name,'%',''));
create temporary table hier(
parent smallint unsigned,
id smallint unsigned,
depth smallint unsigned
)engine = memory;
insert into hier select parent, id, v_dpth from names where id = p_id;
/* http://dev.mysql.com/doc/refman/5.0/en/temporary-table-problems.html */
create temporary table tmp engine=memory select * from hier;
while not v_done do
if exists( select 1 from names n inner join tmp on n.parent = tmp.id and tmp.depth = v_dpth) then
insert into hier select n.parent, n.id, v_dpth + 1
from names n inner join tmp on n.parent = tmp.id and tmp.depth = v_dpth;
set v_dpth = v_dpth + 1;
truncate table tmp;
insert into tmp select * from hier where depth = v_dpth;
else
set v_done = 1;
end if;
end while;
select
n.id,
n.name as emp_name,
p.id as parent,
p.name as parent_name,
hier.depth
from
hier
inner join names n on hier.id = n.id
left outer join names p on hier.parent = p.id
where
n.name like concat(p_name, '%');
drop temporary table if exists hier;
drop temporary table if exists tmp;
end #
delimiter ;
-- call this sproc from your php
call names_hier(1, 'a');
call names_hier(3, 'k');