I'm working on a database with a category tree that's hierarchical. I'd like to be able to be able to write a query that returns all of the parents. For example, assume this structure/content. A parent of 0 means that it's a root element, no parents.
ID | Name | Parent
1 | Tools | 0
2 | Drill | 1
3 | Impact | 2
4 | Cordless | 2
5 | Series X | 4
How could I write a query that would get all of the parents of Series x (ID 5)? I don't care if it's inclusive of ID 5, since I would already have that one. I'd like to see it return the below results.
ID | Name | Parent
1 | Tools | 0
2 | Drill | 1
4 | Cordless | 2
5 | Series X | 4
Bonus if there's a way to find how many generations they are at the same time. Something like:
ID | Name | Parent | Generation
1 | Tools | 0 | 0
2 | Drill | 1 | 1
4 | Cordless | 2 | 2
5 | Series X | 4 | 3
I'm really stuck on this right now. I am thinking it might need to be a custom sql function?
In MySQL 8.0, they now support recursive CTE queries:
WITH RECURSIVE cte AS (
SELECT * FROM MyTable WHERE id = 5
UNION ALL
SELECT MyTable.* FROM cte JOIN MyTable WHERE MyTable.id = cte.parent
)
SELECT * FROM cte ORDER BY id;
Getting the "Generation" when your CTE starts at the leaf of the hierarchy is tricky.
If you are using a version of MySQL older than 8.0, you may like my answer to What is the most efficient/elegant way to parse a flat table into a tree? or my presentation Recursive Query Throwdown.
Related
One very efficient way of storing hierarchies in SQL is by using some kind of 'closure table'.
It's a bit more difficult to edit and uses up more space, but can usually recursively be travelled with a single JOIN or a single query if you are only interested in IDs.
This table would contain 1 record for every possible ancestor/descendant relationship, as well as 1 record per item in the real table for which anc=des=id holds.
For this tree:
1
2 4 7
3 5 6
Our SQL table would contain:
+-----+-----+------+-------+
| anc | des | diff | depth |
+-----+-----+------+-------+
| 1 | 1 | 0 | 0 |
| 1 | 2 | 1 | 0 |
| 2 | 3 | 1 | 1 |
| 1 | 3 | 2 | 0 |
| 1 | 4 | 1 | 0 |
| 2 | 5 | 1 | 1 |
| 1 | 5 | 2 | 0 |
| 4 | 6 | 1 | 1 |
| 1 | 6 | 2 | 0 |
| 1 | 7 | 1 | 0 |
| 2 | 2 | 0 | 1 |
| 3 | 3 | 0 | 2 |
| 4 | 4 | 0 | 1 |
| 5 | 5 | 0 | 2 |
| 6 | 6 | 0 | 2 |
| 7 | 7 | 0 | 1 |
+-----+-----+------+-------+
Then the task: "Get all ancestors of node 5", in other words, the 'path to node 5' can be done with the following query:
SELECT `anc` FROM `closure` WHERE `des` = 5
And the task "Get all descendants of node 1" can be done with this query:
SELECT `des` FROM `closure` WHERE `anc` = 1
While "Get all direct descendants of node 1" is done like so:
SELECT `des` FROM `closure` WHERE `diff` = 1 AND `anc` = 1
Finally, "Get all root nodes" is done like this:
SELECT `anc` FROM `closure` WHERE `depth` = 0 AND `anc` = `des`
These four tasks together form the most utilized ways of selecting things from the tree.
However, in reality, when categorizing things, people can't decide where to put things. Inevitably, something is required to end up in multiple places in the tree. This throws a spanner in the works; two of these naïve queries no longer work (No 1 and No 2).
Note that, in order to prevent problems with stack overflow and recursion, it is true in the example that our graph remains a kind of 'tree'; there are no cycles in it.
The first problem is that "Get all descendants" now has duplicate results. This can be fixed with a GROUP BY clause.
The second, harder, for me unsolved problem is the 1st question: there are now multiple possible paths to a leaf node. Let's split the question into two possible satisfactory results:
Is there a way, using a single or fixed number of JOINs in a single or fixed number of queries, for an arbitrarily deep tree, to get either:
The Canonical path
This is the path with the least number of nodes leftmost in the tree representation to a specific leaf node. Note that it is not necessarily true that the tree is 'sorted' like the example in the data structure, as nodes are arbitrarily inserted and removed.
All paths
Gets all possible paths to a specific leaf node.
An illustrative example of why the naïve method fails:
Consider this tree:
1
2 3
5 4
6 6
Asking the question "What are the ascendants of 6" should have two logical answers:
1-2-5-6 and 1-3-4-6.
Yet, using the naïve query and sorting, we can only really get:
1-2-4-6 or 1-3-5-6.
Which are both not actually valid paths.
In all of the tutorials about closure tables I've read, it's plainly stated that closure tables are capable of handling hierarchies where the same item appears in multiple places, but it's never actually explained how to properly do this, just 'left to the reader'. I run into nontrivial problems trying to, however.
In my table I have two columns "sku" and "fitment". The sku represents a part and the fitment represents all the vehicles this part will fit on. The problem is, in the fitment cells, there could be up to 20 vehicles in there, separated by ^^. For example
**sku -- fitment**
part1 -- Vehichle 1 information ^^ vehichle 2 information ^^ vehichle 3 etc
I am looking to split the cells in the fitment column, so it would look like this:
**sku -- fitment**
part1 -- Vehicle 1 information
part1 -- Vehicle 2 information
part1 -- Vehicle 3 information
Is this possible to do? And if so, would a mySQL db be able to handle hundreds of thousands of items "splitting" like this? I imagine it would turn my db of around 250k lines to about 20million lines. Any help is appreciated!
Also a little more background, this is going to be used for a drill down search function so I would be able to match up parts to vehicles (year, make, model, etc) so if you have a better solution, I am all ears.
Thanks
Possible duplicate of this: Split value from one field to two
Unfortunately, MySQL does not feature a split string function. As in the link above indicates there are User-defined Split function's.
A more verbose version to fetch the data can be the following:
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(fitment, '^^', 1), '^^', -1) as fitmentvehicle1,
SUBSTRING_INDEX(SUBSTRING_INDEX(fitment, '^^', 2), '^^', -1) as fitmentvehicle2
....
SUBSTRING_INDEX(SUBSTRING_INDEX(fitment, '^^', n), '^^', -1) as fitmentvehiclen
FROM table_name;
Since your requirement asks for a normalized format (i.e. not separated by ^^) to be retrieved, it is always better to store it in that way in the first place. And w.r.t the DB size bloat up, you might want to look into possibilities of archiving older data and deleting the same from the table.
Also, you should partition your table using an efficient partitioning strategy based on your requirement. It would be more easier to archive and truncate a partition of the table itself, instead of row by row.
E.g.
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table (user_id INT NOT NULL PRIMARY KEY,stuff VARCHAR(50) NOT NULL);
INSERT INTO my_table VALUES (101,'1,2,3'),(102,'3,4'),(103,'4,5,6');
SELECT *
FROM my_table;
+---------+-------+
| user_id | stuff |
+---------+-------+
| 101 | 1,2,3 |
| 102 | 3,4 |
| 103 | 4,5,6 |
+---------+-------+
SELECT * FROM ints;
+---+
| i |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+---+
SELECT DISTINCT user_id
, SUBSTRING_INDEX(SUBSTRING_INDEX(stuff,',',i2.i*10+i1.i+1),',',-1) x
FROM my_table
, ints i1
, ints i2
ORDER
BY user_id,x;
+---------+---+
| user_id | x |
+---------+---+
| 101 | 1 |
| 101 | 2 |
| 101 | 3 |
| 102 | 3 |
| 102 | 4 |
| 103 | 4 |
| 103 | 5 |
| 103 | 6 |
+---------+---+
In Rails, I am trying one query for MySql - While searching data from DB related to hierarchy, I have to pass specific id on the table.
Table: Hierarchy
id | parent | name
1 | | Electronics
2 | 1 | iPhone
3 | 1 | Moto G ( Android )
4 | | Clothes
5 | 4 | Kidz Wear
Table: Comments
id | hierarchy_id | value
1 | 1 | Best electronic products values in this store.
2 | 1,2 | iPhone is always best.
3 | 4 | Cotton Clothes - Cool
4 | 1,3 | MotoG with Android M - Paise wasool
5 | 4,5 | New Collection Good One ...
Here, when I tried to search data using hierarchy 1, then it will show only one record. Here I am not getting the way to fetch remaining records, because if I select any parent hierarchy then CHILD data should be there.
And If I select any CHILD hierarchy, then system should return value only related to that CHILD, by escaping PARENT and SIBLINGS
Getting Right Now :-
$ select * from comments where hierarchy_id = 1;
id | hierarchy_id | value
1 | 1 | Best electronic products values in this store.
Expected Output has to be for hierarchy_id = 1 :-
$ select * from comments where **************************
id | hierarchy_id | value
1 | 1 | Best electronic products values in this store.
2 | 1,2 | iPhone is always best.
3 | 1,3 | MotoG with Android M - Paise wasool
Expected Output has to be for hierarchy_id = 3 :-
$ select * from comments where **************************
id | hierarchy_id | value
1 | 1,3 | MotoG with Android M - Paise wasool
Please suggest some thing ...
Since you have hierarchy_id in the comments table, why don't you use it in your query:
SELECT * FROM comments WHERE hierarchy_id=1
I've checked out a few of the stackoverflow questions and there are similar questions, but didn't quite put my fingers on this one.
If you have a table like this:
uid cat_uid itm_uid
1 1 4
2 1 5
3 2 6
4 2 7
5 3 8
6 3 9
where the uid column in auto_incremented and the cat_uid references a
category of relevance to filter on and the itm_uid values are the one
we're seeking
I would like to get a result set that contains the following sample results:
array (
0 => array (1 => array(4,5)),
1 => array (2 => array(6,7)),
2 => array (3 => array(8,9))
)
An example issue is - select 2 records from each category (however many categories there may be) and make sure they are the last 2 entries by uid in those categories.
I'm not sure how to structure the question to allow an answer, and any hints on a method for the solution would be welcome!
EDIT:
This wasn't a very clear question, so let me extend the scenario to something more tangible.
I have a set of records being entered into categories and I would like to select, with as few queries as possible, the latest 2 records entered per category, so that when I list out the contents of those categories, I will have at least 2 records per category (assuming that there are 2 or more already in the database). A similar query was in place that selected the last 100 records and filtered them into categories, but for small numbers of categories with some being updated faster than others can lead to having the top 100 not consisting of members from every category, so to try to resolve that, I was looking for a way to select 2 records from each category (or N-records assuming it's the same per-category) and for those 2 records to be the last entered. A date field is available to sort on, but the itm_uid itself could be used to indicate inserted order.
SELECT cat_uid, itm_uid,
IF( #cat = cat_uid, #cat_row := #cat_row + 1, #cat_row := 0 ) AS cat_row,
#cat := cat_uid
FROM my_table
JOIN (SELECT #cat_row := 0, #cat := 0) AS init
HAVING cat_row < 2
ORDER BY cat_uid, uid DESC
You will have two extra columns in the results, just ignore them.
This is the logic:
We sort the table by cat_uid, uid descending, then we start from the top and give each row a "row number" (cat_row) we reset this row number to zero whenever cat_uid changes:
---------------------------------------
| uid | cat_uid | itm_uid | cat_row |
| 45 | 4 | 34 | 0 |
| 33 | 4 | 54 | 1 |
| 31 | 4 | 12 | 2 |
| 12 | 4 | 51 | 3 |
| 56 | 6 | 11 | 0 |
| 20 | 6 | 64 | 1 |
| 16 | 6 | 76 | 2 |
| ... | ... | ... | ... |
---------------------------------------
now if we keep only the rows that have cat_row < 2 we get the results we want:
---------------------------------------
| uid | cat_uid | itm_uid | cat_row |
| 45 | 4 | 34 | 0 |
| 33 | 4 | 54 | 1 |
| 56 | 6 | 11 | 0 |
| 20 | 6 | 64 | 1 |
| ... | ... | ... | ... |
---------------------------------------
This is called an adjacent tree model or a parent-child tree model. It's one of the simplier tree model where there is only 1 pointer or 1 leaf. You would solve your query with a recursion or using a Self Join. Sadly MySQL doesn't support recursive queries, maybe it's working with prepared statements. I want to suggest you an Self Join. With a Self Join you can get all the rows from the right side and the left side with a special condition.
select t1.cat_uid, t2.cat_uid, t1.itm_uid, t2.itm_uid From t1 Inner Join t2 On t1.cat_uid = t2.cat_uid
Lets say we have a table that looks like this
connection_requirements
+-----------------------------------+
| item_id | connector_id | quantity |
+---------+--------------+----------+
| 1 | 4 | 1 |
| 1 | 5 | 1 |
| 1 | 2 | 2 |
+---------+--------------+----------+
This table is a list of connectors that a electronic device requires to operate, and how many of each type of connector it requires. (Think connections on a motherboard requiring certain types of connectors from a power supply)
Now we also have this table...
connections_compatability
+-------------------------+
| connector_id | works_as |
+--------------+----------+
| 6 | 4 |
| 6 | 5 |
+--------------+----------+
Where the first column is the connector that can also act as the connector id of the second column. (For instance a power supply has connectors such as "6+2 Pin" which can work as "8 Pin" or "6 Pin")
Now finally we have how many of each connectors are available in this table
connector_quantities
+-------------------------+
| connector_id | quantity |
+--------------+----------+
| 1 | 1 |
| 2 | 3 |
| 3 | 2 |
| 4 | 1 |
| 5 | 0 |
| 6 | 4 |
| 7 | 0 |
| 8 | 5 |
+--------------+----------+
Based off these tables, as you can infer, we do have enough connectors for item number 1 to properly operate. Even though we do not have enough of connector #5, we have 4 connector #6s, which can work as connector #4 and #5.
The connection_requirements table is joined onto the items table, how can we filter items that require more connections than we have available? We already have the code in place to filter items that require connectors that are unavailable.
The problem has many more layers of complexity to it, so we tried to simplify the problem.
Much appreciation for all the help!
One approach is to determine the "real" inventory of items including their substitutions. E.g., the real inventory of part 4 is actually 5: 1-part #4 + 4-part #6. So using that:
Select ...
From connection_requirements As CR
Where Not Exists (
Select 1
From connector_quantities As Q1
Left Join (
Select C2.connector_id, C2.works_as, Q2.quantity
From connections_compatibility As C2
Join connector_quantities As Q2
On Q2.connector_id = C2.connector_id
) As Subs1
On Subs1.works_as = Q1.connector_id
Where Q1.connector_id = CR.connector_id
And ( Coalesce(Subs1.quantity, 0) + Q1.quantity ) > CR.quantity
)
There is of course a catch with this approach. Suppose you have an item with a makeup of: 4 #4 connectors and 2 #6 connectors. Technically, you do have 4 #4 connectors (1 #4, and 3 #6 substitutions) but in combination with the requirement of 2 #6 connectors, you do not have enough parts. To solve this problem you would likely have to use a loop or multiple queries which would determine on-hand inventory after you use up all your primary parts.