One very efficient way of storing hierarchies in SQL is by using some kind of 'closure table'.
It's a bit more difficult to edit and uses up more space, but can usually recursively be travelled with a single JOIN or a single query if you are only interested in IDs.
This table would contain 1 record for every possible ancestor/descendant relationship, as well as 1 record per item in the real table for which anc=des=id holds.
For this tree:
1
2 4 7
3 5 6
Our SQL table would contain:
+-----+-----+------+-------+
| anc | des | diff | depth |
+-----+-----+------+-------+
| 1 | 1 | 0 | 0 |
| 1 | 2 | 1 | 0 |
| 2 | 3 | 1 | 1 |
| 1 | 3 | 2 | 0 |
| 1 | 4 | 1 | 0 |
| 2 | 5 | 1 | 1 |
| 1 | 5 | 2 | 0 |
| 4 | 6 | 1 | 1 |
| 1 | 6 | 2 | 0 |
| 1 | 7 | 1 | 0 |
| 2 | 2 | 0 | 1 |
| 3 | 3 | 0 | 2 |
| 4 | 4 | 0 | 1 |
| 5 | 5 | 0 | 2 |
| 6 | 6 | 0 | 2 |
| 7 | 7 | 0 | 1 |
+-----+-----+------+-------+
Then the task: "Get all ancestors of node 5", in other words, the 'path to node 5' can be done with the following query:
SELECT `anc` FROM `closure` WHERE `des` = 5
And the task "Get all descendants of node 1" can be done with this query:
SELECT `des` FROM `closure` WHERE `anc` = 1
While "Get all direct descendants of node 1" is done like so:
SELECT `des` FROM `closure` WHERE `diff` = 1 AND `anc` = 1
Finally, "Get all root nodes" is done like this:
SELECT `anc` FROM `closure` WHERE `depth` = 0 AND `anc` = `des`
These four tasks together form the most utilized ways of selecting things from the tree.
However, in reality, when categorizing things, people can't decide where to put things. Inevitably, something is required to end up in multiple places in the tree. This throws a spanner in the works; two of these naïve queries no longer work (No 1 and No 2).
Note that, in order to prevent problems with stack overflow and recursion, it is true in the example that our graph remains a kind of 'tree'; there are no cycles in it.
The first problem is that "Get all descendants" now has duplicate results. This can be fixed with a GROUP BY clause.
The second, harder, for me unsolved problem is the 1st question: there are now multiple possible paths to a leaf node. Let's split the question into two possible satisfactory results:
Is there a way, using a single or fixed number of JOINs in a single or fixed number of queries, for an arbitrarily deep tree, to get either:
The Canonical path
This is the path with the least number of nodes leftmost in the tree representation to a specific leaf node. Note that it is not necessarily true that the tree is 'sorted' like the example in the data structure, as nodes are arbitrarily inserted and removed.
All paths
Gets all possible paths to a specific leaf node.
An illustrative example of why the naïve method fails:
Consider this tree:
1
2 3
5 4
6 6
Asking the question "What are the ascendants of 6" should have two logical answers:
1-2-5-6 and 1-3-4-6.
Yet, using the naïve query and sorting, we can only really get:
1-2-4-6 or 1-3-5-6.
Which are both not actually valid paths.
In all of the tutorials about closure tables I've read, it's plainly stated that closure tables are capable of handling hierarchies where the same item appears in multiple places, but it's never actually explained how to properly do this, just 'left to the reader'. I run into nontrivial problems trying to, however.
Related
I'm working on a database with a category tree that's hierarchical. I'd like to be able to be able to write a query that returns all of the parents. For example, assume this structure/content. A parent of 0 means that it's a root element, no parents.
ID | Name | Parent
1 | Tools | 0
2 | Drill | 1
3 | Impact | 2
4 | Cordless | 2
5 | Series X | 4
How could I write a query that would get all of the parents of Series x (ID 5)? I don't care if it's inclusive of ID 5, since I would already have that one. I'd like to see it return the below results.
ID | Name | Parent
1 | Tools | 0
2 | Drill | 1
4 | Cordless | 2
5 | Series X | 4
Bonus if there's a way to find how many generations they are at the same time. Something like:
ID | Name | Parent | Generation
1 | Tools | 0 | 0
2 | Drill | 1 | 1
4 | Cordless | 2 | 2
5 | Series X | 4 | 3
I'm really stuck on this right now. I am thinking it might need to be a custom sql function?
In MySQL 8.0, they now support recursive CTE queries:
WITH RECURSIVE cte AS (
SELECT * FROM MyTable WHERE id = 5
UNION ALL
SELECT MyTable.* FROM cte JOIN MyTable WHERE MyTable.id = cte.parent
)
SELECT * FROM cte ORDER BY id;
Getting the "Generation" when your CTE starts at the leaf of the hierarchy is tricky.
If you are using a version of MySQL older than 8.0, you may like my answer to What is the most efficient/elegant way to parse a flat table into a tree? or my presentation Recursive Query Throwdown.
I am writing an order fulfillment system in using a MySQL Server which pre-packaged boxes have been filled with items and I need to determine which, if any, combination of boxes to ship to satisfy an order. I am unsure what SQL tools are available to me to efficiently solve query for the boxes required.
An table of order items looks like the following, if I had order 1 be for a fork and spoon:
mysql> select * from order_items;
+----+-------+----------+
| id | name | order_id |
+----+-------+----------+
| 1 | fork | 1 |
| 2 | spoon | 1 |
+----+-------+----------+
2 rows in set (0.00 sec)
while the boxes are arranged as
mysql> select * from boxes;
+----+------+
| id | name |
+----+------+
| 1 | box1 |
| 2 | box2 |
| 3 | box3 |
+----+------+
3 rows in set (0.00 sec)
and
mysql> select * from items;
+----+-------+--------+
| id | name | box_id |
+----+-------+--------+
| 1 | spoon | 1 |
| 2 | knife | 1 |
| 3 | fork | 1 |
| 4 | knife | 2 |
| 5 | fork | 2 |
| 6 | spoon | 3 |
| 7 | knife | 3 |
+----+-------+--------+
7 rows in set (0.00 sec)
As you can see, there is a problem. It could be that boxes may contain an individual fork and another an individual spoon, or both in one box. However, in this case I have boxes with all three utensils, or a mixture of each. It is expected that, in general, a single box will not cover all the requirements for the order but it is acceptable to send a box with extra items if needbe. In this case, a single box would work OR the combination of the other two available boxes. In either case one or two extra knives would be sent. Ideally, we would like to send the minimal number of extra utensils but we do not care about the number of boxes.
What is the appropriate query to efficiently determine what combination of boxes will work? I do not know of a single query, but I think that a series of queries that try and find a match for all N items, then N-1, then N-2 until a match is found and then repeat for the remaining items. This seems fairly inefficient, though.
Edit:
The problem is to find a subset of all boxes B_S such m_i is a member of B_S for all items m_i in order M.
I have a table that stores nested sets. It stores different nested sets differentiated by a collectionid (yes i'm mixing terms here, really should be nestedsetid). it looks somewhat like this:
id | orgid | leftedge | rightedge | level | collectionid
1 | 123 | 1 | 6 | 1 | 1
2 | 111 | 2 | 3 | 2 | 1
3 | 23 | 4 | 5 | 2 | 1
4 | 67 | 1 | 2 | 1 | 2
5 | 123 | 3 | 4 | 1 | 2
6 | 600 | 1 | 6 | 1 | 3
7 | 11 | 2 | 5 | 2 | 3
8 | 111 | 3 | 4 | 3 | 3
Originally I wanted to take advantage of the R-Tree Indexes, but the code i have seen for this: LineString(Point(-1, leftedge), Point(1, rightedge)) won't quite work since it doesn't take into account the collectionid and this id:1 and id:6 would end up being the same.
Is there a way I can use the R-Tree index with my current set up... Surely you can have different nested sets in the same table? My main aim is to be able to use the MBRWithin and MBRContains functions. Using MySQL 5.1
For single-dimensional data (these are 1d intervals, right?) there exist better index structures than r-trees. These are designed for dynamic data in 2-10 dimensions (at higher dimensions, performance isn't too good, as the split strategies and distance functions don't work very well anymore)
Actually for your use case, classic SQL should work very well. And the database can make use of its indexes efficiently. Having a good index structure is one thing, but you want to have the database exploit the indexes it has as good as possible.
As such, I'd just index leftEdge and rightEdge and the <, <=, >, >= functions. They are fast! And for the collectionid column, a bitmap index should be good.
I've checked out a few of the stackoverflow questions and there are similar questions, but didn't quite put my fingers on this one.
If you have a table like this:
uid cat_uid itm_uid
1 1 4
2 1 5
3 2 6
4 2 7
5 3 8
6 3 9
where the uid column in auto_incremented and the cat_uid references a
category of relevance to filter on and the itm_uid values are the one
we're seeking
I would like to get a result set that contains the following sample results:
array (
0 => array (1 => array(4,5)),
1 => array (2 => array(6,7)),
2 => array (3 => array(8,9))
)
An example issue is - select 2 records from each category (however many categories there may be) and make sure they are the last 2 entries by uid in those categories.
I'm not sure how to structure the question to allow an answer, and any hints on a method for the solution would be welcome!
EDIT:
This wasn't a very clear question, so let me extend the scenario to something more tangible.
I have a set of records being entered into categories and I would like to select, with as few queries as possible, the latest 2 records entered per category, so that when I list out the contents of those categories, I will have at least 2 records per category (assuming that there are 2 or more already in the database). A similar query was in place that selected the last 100 records and filtered them into categories, but for small numbers of categories with some being updated faster than others can lead to having the top 100 not consisting of members from every category, so to try to resolve that, I was looking for a way to select 2 records from each category (or N-records assuming it's the same per-category) and for those 2 records to be the last entered. A date field is available to sort on, but the itm_uid itself could be used to indicate inserted order.
SELECT cat_uid, itm_uid,
IF( #cat = cat_uid, #cat_row := #cat_row + 1, #cat_row := 0 ) AS cat_row,
#cat := cat_uid
FROM my_table
JOIN (SELECT #cat_row := 0, #cat := 0) AS init
HAVING cat_row < 2
ORDER BY cat_uid, uid DESC
You will have two extra columns in the results, just ignore them.
This is the logic:
We sort the table by cat_uid, uid descending, then we start from the top and give each row a "row number" (cat_row) we reset this row number to zero whenever cat_uid changes:
---------------------------------------
| uid | cat_uid | itm_uid | cat_row |
| 45 | 4 | 34 | 0 |
| 33 | 4 | 54 | 1 |
| 31 | 4 | 12 | 2 |
| 12 | 4 | 51 | 3 |
| 56 | 6 | 11 | 0 |
| 20 | 6 | 64 | 1 |
| 16 | 6 | 76 | 2 |
| ... | ... | ... | ... |
---------------------------------------
now if we keep only the rows that have cat_row < 2 we get the results we want:
---------------------------------------
| uid | cat_uid | itm_uid | cat_row |
| 45 | 4 | 34 | 0 |
| 33 | 4 | 54 | 1 |
| 56 | 6 | 11 | 0 |
| 20 | 6 | 64 | 1 |
| ... | ... | ... | ... |
---------------------------------------
This is called an adjacent tree model or a parent-child tree model. It's one of the simplier tree model where there is only 1 pointer or 1 leaf. You would solve your query with a recursion or using a Self Join. Sadly MySQL doesn't support recursive queries, maybe it's working with prepared statements. I want to suggest you an Self Join. With a Self Join you can get all the rows from the right side and the left side with a special condition.
select t1.cat_uid, t2.cat_uid, t1.itm_uid, t2.itm_uid From t1 Inner Join t2 On t1.cat_uid = t2.cat_uid
Lets say we have a table that looks like this
connection_requirements
+-----------------------------------+
| item_id | connector_id | quantity |
+---------+--------------+----------+
| 1 | 4 | 1 |
| 1 | 5 | 1 |
| 1 | 2 | 2 |
+---------+--------------+----------+
This table is a list of connectors that a electronic device requires to operate, and how many of each type of connector it requires. (Think connections on a motherboard requiring certain types of connectors from a power supply)
Now we also have this table...
connections_compatability
+-------------------------+
| connector_id | works_as |
+--------------+----------+
| 6 | 4 |
| 6 | 5 |
+--------------+----------+
Where the first column is the connector that can also act as the connector id of the second column. (For instance a power supply has connectors such as "6+2 Pin" which can work as "8 Pin" or "6 Pin")
Now finally we have how many of each connectors are available in this table
connector_quantities
+-------------------------+
| connector_id | quantity |
+--------------+----------+
| 1 | 1 |
| 2 | 3 |
| 3 | 2 |
| 4 | 1 |
| 5 | 0 |
| 6 | 4 |
| 7 | 0 |
| 8 | 5 |
+--------------+----------+
Based off these tables, as you can infer, we do have enough connectors for item number 1 to properly operate. Even though we do not have enough of connector #5, we have 4 connector #6s, which can work as connector #4 and #5.
The connection_requirements table is joined onto the items table, how can we filter items that require more connections than we have available? We already have the code in place to filter items that require connectors that are unavailable.
The problem has many more layers of complexity to it, so we tried to simplify the problem.
Much appreciation for all the help!
One approach is to determine the "real" inventory of items including their substitutions. E.g., the real inventory of part 4 is actually 5: 1-part #4 + 4-part #6. So using that:
Select ...
From connection_requirements As CR
Where Not Exists (
Select 1
From connector_quantities As Q1
Left Join (
Select C2.connector_id, C2.works_as, Q2.quantity
From connections_compatibility As C2
Join connector_quantities As Q2
On Q2.connector_id = C2.connector_id
) As Subs1
On Subs1.works_as = Q1.connector_id
Where Q1.connector_id = CR.connector_id
And ( Coalesce(Subs1.quantity, 0) + Q1.quantity ) > CR.quantity
)
There is of course a catch with this approach. Suppose you have an item with a makeup of: 4 #4 connectors and 2 #6 connectors. Technically, you do have 4 #4 connectors (1 #4, and 3 #6 substitutions) but in combination with the requirement of 2 #6 connectors, you do not have enough parts. To solve this problem you would likely have to use a loop or multiple queries which would determine on-hand inventory after you use up all your primary parts.