Getting limited amount of records from hierarchical data - mysql

Let's say I have 3 tables (significant columns only)
Category (catId key, parentCatId)
Category_Hierarchy (catId key, parentTrail, catLevel)
Product (prodId key, catId, createdOn)
There's a reason for having a separate Category_Hierarchy table, because I'm using triggers on Category table that populate it, because MySql triggers work as they do and I can't populate columns on the same table inside triggers if I would like to use auto_increment values. For the sake of this problem this is irrelevant. These two tables are 1:1 anyway.
Category table could be:
+-------+-------------+
| catId | parentCatId |
+-------+-------------+
| 1 | NULL |
| 2 | 1 |
| 3 | 2 |
| 4 | 3 |
| 5 | 3 |
| 6 | 4 |
| ... | ... |
+-------+-------------+
Category_Hierarchy
+-------+-------------+----------+
| catId | parentTrail | catLevel |
+-------+-------------+----------+
| 1 | 1/ | 0 |
| 2 | 1/2/ | 1 |
| 3 | 1/2/3/ | 2 |
| 4 | 1/2/3/4/ | 3 |
| 5 | 1/2/3/5/ | 3 |
| 6 | 1/2/3/4/6/ | 4 |
| ... | ... | ... |
+-------+-------------+----------+
Product
+--------+-------+---------------------+
| prodId | catId | createdOn |
+--------+-------+---------------------+
| 1 | 4 | 2010-02-03 12:09:24 |
| 2 | 4 | 2010-02-03 12:09:29 |
| 3 | 3 | 2010-02-03 12:09:36 |
| 4 | 1 | 2010-02-03 12:09:39 |
| 5 | 3 | 2010-02-03 12:09:50 |
| ... | ... | ... |
+--------+-------+---------------------+
Category_Hierarchy makes it simple to get category subordinate trees like this:
select c.*
from Category c
join Category_Hierarchy h
on (h.catId = c.catId)
where h.parentTrail like '1/2/3/%'
Which would return complete subordinate tree of category 3 (that is below 2, that is below 1 which is root category) including subordinate tree root node. Excluding root node is just one more where condition.
The problem
I would like to write a stored procedure:
create procedure GetLatestProductsFromSubCategories(in catId int)
begin
/* return 10 latest products from each */
/* catId subcategory subordinate tree */
end;
This means if a certain category had 3 direct sub categories (with whatever number of nodes underneath) I would get 30 results (10 from each subordinate tree). If it had 5 sub categories I'd get 50 results.
What would be the best/fastest/most efficient way to do this? If possible I'd like to avoid cursors unless they'd work faster compared to any other solution as well as prepared statements, because this would be one of the most frequent calls to DB.
Edit
Since a picture tells 1000 words I'll try to better explain what I want using an image. Below image shows category tree. Each of these nodes can have an arbitrary number of products related to them. Products are not included in the picture.
So if I'd execute this call:
call GetLatestProductsFromSubCategories(1);
I'd like to effectively get 30 products:
10 latest products from the whole orange subtree
10 latest products from the whole blue subtree and
10 latest products from the whole green subtree
I don't want to get 10 latest products from each node under catId=1 node which would mean 320 products.

Final Solution
This solution has O(n) performance:
CREATE PROCEDURE foo(IN in_catId INT)
BEGIN
DECLARE done BOOLEAN DEFAULT FALSE;
DECLARE first_iteration BOOLEAN DEFAULT TRUE;
DECLARE current VARCHAR(255);
DECLARE categories CURSOR FOR
SELECT parentTrail
FROM category
JOIN category_hierarchy USING (catId)
WHERE parentCatId = in_catId;
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done = TRUE;
SET #query := '';
OPEN categories;
category_loop: LOOP
FETCH categories INTO current;
IF `done` THEN LEAVE category_loop; END IF;
IF first_iteration = TRUE THEN
SET first_iteration = FALSE;
ELSE
SET #query = CONCAT(#query, " UNION ALL ");
END IF;
SET #query = CONCAT(#query, "(SELECT product.* FROM product JOIN category_hierarchy USING (catId) WHERE parentTrail LIKE CONCAT('",current,"','%') ORDER BY createdOn DESC LIMIT 10)");
END LOOP category_loop;
CLOSE categories;
IF #query <> '' THEN
PREPARE stmt FROM #query;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END IF;
END
Edit
Due to the latest clarification, this solution was simply edited to simplify the categories cursor query.
Note: Make the VARCHAR on line 5 the appropriate size based on your parentTrail column.

Related

MySQL recursive selection from one table

I have this table in MySQL:
| id | mainid | name |
+----+--------+---------------------+
| 1 | 0 | main 1 |
| 2 | 1 | sub 1 |
| 3 | 1 | sub 2 |
| 4 | 1 | sub 3 |
| 5 | 4 | subsub 1 |
| 6 | 4 | subsub 2 |
| 7 | 0 | main 2 |
| 8 | 7 | sub 4 |
| 9 | 7 | sub 5 |
The mainid field is associate with id field.
Is there a best practice in MySQL commands to select all row recursive? I want to select all subitems under main item.
I tried to select all subitems on first level for example sub 1, sub 2, sub3 is under main 1. This is simple:
SELECT id, mainid, name FROM mytable WHERE mainid = '1';
But is there a one-line-command to select same rows AND the subsub1 and subsub 2 rows too? (And of cours if I create another deeper levels thats too.)
you'll need temp tables and separate stored procedures
first stored procedure will receive the "parent" id and create a result temp table:
RESULT (id, mainid, name)
and a check temp table
CHECK (id, passed)
(this table is necessary to avoid infinite loops)
So, the idea is that you call the inner stored procedure, and the inner stored is something like this
PROC (currentId (int))
with the parent id and the proc will do basically what your query did, but save it in a inner temp table, and then for each element of that temp table (that is not in CHECK) it will mark it as passed in CHECK (just insert the row) and call the same proc for each of the "children" of currentId
Then insert all data from the inner temp table into RESULT and you'll have your entire list of descendants
you have 2 ways
check children and then insert into RESULT
or
insert into RESULT and then check children
the data will be ordered differently but the result should be the same

Update connected components in a MySQL table

Suppose I have a MySQL table that defines a collection of things, each of which is associated with either 1 or 2 owners. For example:
CREATE TABLE thing (
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT
, name CHAR(10)
, first_owner INT UNSIGNED NOT NULL
, second_owner INT UNSIGNED DEFAULT NULL
);
+----+------------+-------------+--------------+
| id | name | first_owner | second_owner |
+----+------------+-------------+--------------+
| 1 | skateboard | Joe | NULL |
| 2 | flashlight | Joe | NULL |
| 3 | drill | Joe | Erica |
| 4 | computer | Erica | NULL |
| 5 | textbook | Diane | NULL |
| 6 | cell phone | Amy | Diane |
| 7 | piano | Paul | Amy |
+----+------------+-------------+--------------+
Each distinct owner is a node of a graph, and two owners in the same row constitute an edge between their nodes. A graph drawn from the above example rows looks like this:
In this example, there are two components: Joe and Erica are one; Diane, Paul and Amy are the other.
I want to identify these components in my table, so I add another column:
ALTER TABLE thing ADD COLUMN `group` INT UNSIGNED;
How could I write an UPDATE statement that would populate this new column by uniquely identifying the connected component to which the row belongs? Here's an example of an acceptable result for the above example rows:
+----+------------+-------------+--------------+-------+
| id | name | first_owner | second_owner | group |
+----+------------+-------------+--------------+-------+
| 1 | skateboard | Joe | NULL | 1 |
| 2 | flashlight | Joe | NULL | 1 |
| 3 | drill | Joe | Erica | 1 |
| 4 | computer | Erica | NULL | 1 |
| 5 | textbook | Diane | NULL | 2 |
| 6 | cell phone | Amy | Diane | 2 |
| 7 | piano | Paul | Amy | 2 |
+----+------------+-------------+--------------+-------+
I could do this with a stored procedure, but my actual scenario involves more tables and millions of rows, so I'm hoping there's a clever way to do this without looping through cursors for a week.
This is a simplified example for the purpose of illustrating the problem. Each component is supposed to represent a "household" and most will have only 1 or 2 nodes, but those with more nodes are especially important. There isn't necessarily any strict upper limit to the size of a household.
You can consider this method of creating hierarchical queries in mysql
CREATE FUNCTION hierarchy_connect_by_parent_eq_prior_id(value INT) RETURNS INT
NOT DETERMINISTIC
READS SQL DATA
BEGIN
DECLARE _id INT;
DECLARE _parent INT;
DECLARE _next INT;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET #id = NULL;
SET _parent = #id;
SET _id = -1;
IF #id IS NULL THEN
RETURN NULL;
END IF;
LOOP
SELECT MIN(id)
INTO #id
FROM t_hierarchy
WHERE parent = _parent
AND id > _id;
IF #id IS NOT NULL OR _parent = #start_with THEN
SET #level = #level + 1;
RETURN #id;
END IF;
SET #level := #level - 1;
SELECT id, parent
INTO _id, _parent
FROM t_hierarchy
WHERE id = _parent;
END LOOP;
END
Also, a very good article on this topic Adjacency list vs. nested sets: MySQL
A very good answer to a related question
"What is the most efficient/elegant way to parse a flat table into a
tree?"
There are several ways to store tree-structured data in a relational
database. What you show in your example uses two methods:
Adjacency List (the "parent" column) and
Path Enumeration (the dotted-numbers in your name column).
Another solution is called Nested Sets, and it can be stored in the
same table too. Read "Trees and Hierarchies in SQL for Smarties" by
Joe Celko for a lot more information on these designs.
I usually prefer a design called Closure Table (aka "Adjacency
Relation") for storing tree-structured data. It requires another
table, but then querying trees is pretty easy.
Please have a look at original question for reference.

Search contacts upto multiple levels [duplicate]

I have a database with a tree of names that can go down a total of 9 levels deep and I need to be able to search down a signal branch of the tree from any point on the branch.
Database:
+----------------------+
| id | name | parent |
+----------------------+
| 1 | tom | 0 |
| 2 | bob | 0 |
| 3 | fred | 1 |
| 4 | tim | 2 |
| 5 | leo | 4 |
| 6 | sam | 4 |
| 7 | joe | 6 |
| 8 | jay | 3 |
| 9 | jim | 5 |
+----------------------+
Tree:
tom
fred
jay
bob
tim
sam
joe
leo
jim
For example:
If I search "j" from the user "bob" I should get only "joe" and "jim". If I search "j" form "leo" I should only get "jim".
I can't think of any easy way do to this so any help is appreciated.
You should really consider using the Modified Preorder Tree Traversal which makes such queries much easier. Here's your table expressed with MPTT. I have left the parent field, as it makes some queries easier.
+----------------------+-----+------+
| id | name | parent | lft | rght |
+----------------------+-----+------+
| 1 | tom | 0 | 1 | 6 |
| 2 | bob | 0 | 7 | 18 |
| 3 | fred | 1 | 2 | 5 |
| 4 | tim | 2 | 8 | 17 |
| 5 | leo | 4 | 12 | 15 |
| 6 | sam | 4 | 9 | 16 |
| 7 | joe | 6 | 10 | 11 |
| 8 | jay | 3 | 3 | 4 |
| 9 | jim | 5 | 13 | 14 |
+----------------------+-----+------+
To search j from user bob you'd use the lft and rght values for bob:
SELECT * FROM table WHERE name LIKE 'j%' AND lft > 7 AND rght < 18
Implementing the logic to update lft and rght for adding, removing and reordering nodes can be a challenge (hint: use an existing library if you can) but querying will be a breeze.
There isn't a nice/easy way of doing this; databases don't support tree-style data structures well.
You will need to work on a level-by-level basis to prune results from child-to-parent, or create a view that gives all 9 generations from a given node, and match using an OR on the descendants.
Have you thought about using a recursive loop? i use a loop for a cms i built on top of codeigniter that allows me to start anywhere in the site tree and will then subsequently filter trhough all the children> grand children > great grand children etc. Plus it keeps the sql down to short rapid queries opposed to lots of complicated joins. It may need some modifying in your case but i think it could work.
/**
* build_site_tree
*
* #return void
* #author Mike Waites
**/
public function build_site_tree($parent_id)
{
return $this->find_children($parent_id);
}
/** end build_site_tree **/
// -----------------------------------------------------------------------
/**
* find_children
* Recursive loop to find parent=>child relationships
*
* #return array $children
* #author Mike Waites
**/
public function find_children($parent_id)
{
$this->benchmark->mark('find_children_start');
if(!class_exists('Account_model'))
$this->load->model('Account_model');
$children = $this->Account_model->get_children($parent_id);
/** Recursively Loop over the results to build the site tree **/
foreach($children as $key => $child)
{
$childs = $this->find_children($child['id']);
if (count($childs) > 0)
$children[$key]['children'] = $childs;
}
return $children;
$this->benchmark->mark('find_children_end');
}
/** end find_children **/
As you can see this is a pretty simplfied version and bear in mind this has been built into codeigniter so you will need to modyfy it to suite but basically we have a loop that calls itself adding to an array each time as it goes. This will allow you to get the whole tree, or even start from a point in the tree as long as you have the parent_id avaialble first!
Hope this helps
The new "recursive with" construct will do the job, but I don't know id MySQL supports it (yet).
with recursive bobs(id) as (
select id from t where name = 'bob'
union all
select t.id from t, bobs where t.parent_id = bobs.id
)
select t.name from t, bobs where t.id = bobs.id
and name like 'j%'
There is no single SQL query that will return the data in tree format - you need processing to traverse it in the right order.
One way is to query MySQL to return MPTT:
SELECT * FROM table ORDER BY parent asc;
root of the tree will be the first item of the table, its children will be next, etc., the tree being listed "breadth first" (in layers of increasing depth)
Then use PHP to process the data, turning it into an object that holds the data structure.
Alternatively, you could implement MySQL search functions that given a node, recursively search and return a table of all its descendants, or a table of all its ancestors. As these procedures tend to be slow (being recursive, returning too much data that is then filtered by other criteria), you want to only do this if you know you're not querying for that kind of data again and again, or if you know that the data set remains small (9 levels deep and how wide?)
You can do this with a stored procedure as follows:
Example calls
mysql> call names_hier(1, 'a');
+----+----------+--------+-------------+-------+
| id | emp_name | parent | parent_name | depth |
+----+----------+--------+-------------+-------+
| 2 | ali | 1 | f00 | 1 |
| 8 | anna | 6 | keira | 4 |
+----+----------+--------+-------------+-------+
2 rows in set (0.00 sec)
mysql> call names_hier(3, 'k');
+----+----------+--------+-------------+-------+
| id | emp_name | parent | parent_name | depth |
+----+----------+--------+-------------+-------+
| 6 | keira | 5 | eva | 2 |
+----+----------+--------+-------------+-------+
1 row in set (0.00 sec)
$sqlCmd = sprintf("call names_hier(%d,'%s')", $id, $name); // dont forget to escape $name
$result = $db->query($sqlCmd);
Full script
drop table if exists names;
create table names
(
id smallint unsigned not null auto_increment primary key,
name varchar(255) not null,
parent smallint unsigned null,
key (parent)
)
engine = innodb;
insert into names (name, parent) values
('f00',null),
('ali',1),
('megan',1),
('jessica',3),
('eva',3),
('keira',5),
('mandy',6),
('anna',6);
drop procedure if exists names_hier;
delimiter #
create procedure names_hier
(
in p_id smallint unsigned,
in p_name varchar(255)
)
begin
declare v_done tinyint unsigned default(0);
declare v_dpth smallint unsigned default(0);
set p_name = trim(replace(p_name,'%',''));
create temporary table hier(
parent smallint unsigned,
id smallint unsigned,
depth smallint unsigned
)engine = memory;
insert into hier select parent, id, v_dpth from names where id = p_id;
/* http://dev.mysql.com/doc/refman/5.0/en/temporary-table-problems.html */
create temporary table tmp engine=memory select * from hier;
while not v_done do
if exists( select 1 from names n inner join tmp on n.parent = tmp.id and tmp.depth = v_dpth) then
insert into hier select n.parent, n.id, v_dpth + 1
from names n inner join tmp on n.parent = tmp.id and tmp.depth = v_dpth;
set v_dpth = v_dpth + 1;
truncate table tmp;
insert into tmp select * from hier where depth = v_dpth;
else
set v_done = 1;
end if;
end while;
select
n.id,
n.name as emp_name,
p.id as parent,
p.name as parent_name,
hier.depth
from
hier
inner join names n on hier.id = n.id
left outer join names p on hier.parent = p.id
where
n.name like concat(p_name, '%');
drop temporary table if exists hier;
drop temporary table if exists tmp;
end #
delimiter ;
-- call this sproc from your php
call names_hier(1, 'a');
call names_hier(3, 'k');

MySql: ORDER BY parent and child

I have a table like:
+------+---------+-
| id | parent |
+------+---------+
| 2043 | NULL |
| 2044 | 2043 |
| 2045 | 2043 |
| 2049 | 2043 |
| 2047 | NULL |
| 2048 | 2047 |
| 2049 | 2047 |
+------+---------+
which shows a simple, 2-level "parent-child"-corelation. How can I ORDER BY an SELECT-statement to get the order like in the list above, which means: 1st parent, childs of 1st parent, 2nd parent, childs of 2nd parent and so on (if I have that, I can add the ORDER BYs for the children... I hope). Is it possible withoug adding a sort-field?
Including sorting children by id:
ORDER BY COALESCE(parent, id), parent IS NOT NULL, id
SQL Fiddle example
Explanation:
COALESCE(parent, id): First sort by (effectively grouping together) the parent's id.
parent IS NOT NULL: Put the parent row on top of the group
id: Finally sort all the children (same parent, and parent is not null)
If your table uses 0 instead of null to indicate an entry with no parent:
id | parent
-------------
1233 | 0
1234 | 1233
1235 | 0
1236 | 1233
1237 | 1235
Use greatest instead of coalesce and check the value does not equal 0:
ORDER BY GREATEST(parent, id), parent != 0, id
The solution above didn't work for me, my table used 0 instead of NULL.
I found this other solution: you create a column with the concatened parent id and child id in your query and you can sort the result by it .
SELECT CONCAT(IF(parent = 0,'',CONCAT('/',parent)),'/',id) AS gen_order
FROM table
ORDER BY gen_order
This question still shows as one of the first search results. So I would like to share a my solution and hope it will help more people out. This will also work when you have a table with many levels of parent and child relations. Although it is quite a slow solution. The top level has NULL as parent.
+---------+---------+
| id | parent |
+---------+---------+
| 1 | NULL |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
+---------+---------+
In my approach I will use a procedure that will recursively call itself and keep prepending the path with the parent of the requested id until it reaches the NULL parent.
DELIMITER $$
CREATE DEFINER=`root`#`localhost` PROCEDURE `PATH`(IN `input` INT, OUT `output` VARCHAR(128))
BEGIN
DECLARE _id INT;
DECLARE _parent INT;
DECLARE _path VARCHAR(128);
SET `max_sp_recursion_depth` = 50;
SELECT `id`, `parent`
INTO _id, _parent
FROM `database`.`table`
WHERE `table`.`id` = `input`;
IF _parent IS NULL THEN
SET _path = _id;
ELSE
CALL `PATH`(_parent, _path);
SELECT CONCAT(_path, '-', _id) INTO _path;
END IF;
SELECT _path INTO `output`;
END $$
DELIMITER ;
To use the results in an ORDER BY clause you will need a FUNCTION too that wraps the results of the PROCEDURE.
DELIMITER $$
CREATE DEFINER=`root`#`localhost` FUNCTION `GETPATH`(`input` INT) RETURNS VARCHAR(128)
BEGIN
CALL `PATH`(`input`, #path);
RETURN #path;
END $$
DELIMITER ;
Now we can use the recursive path to sort the order of the table. On a table with 10000 rows it takes just over a second on my workstation.
SELECT `id`, `parent`, GETPATH(`id`) `path` FROM `database`.`table` ORDER BY `GETPATH`(`id`);
Example output:
+---------+---------+---------------+
| id | parent | path |
+---------+---------+---------------+
| 1 | NULL | 1 |
| 10 | 1 | 1-10 |
| 300 | 10 | 1-10-300 |
| 301 | 300 | 1-10-300-301 |
| 302 | 300 | 1-10-300-302 |
+---------+---------+---------------+
5 rows in set (1,39 sec)

MYSQL Function with a calcuation based on data in a db column

I Have a table that is a lookup for scoring points based on Place (P) and Number of Racers(R)
and scoring formats indicated by points_id. Two cases are shown in the table.
Sometime the points are determined directly by the values of P and N as in points_id =3
other times they are most easily determined by a simple calculation shown in the pts_calc column.
|points_id| P | N |points|pts_calc|
| 1 | 0 | 0 | NULL | pin |
| 1 |DNS| 0 | NULL | nin+1 |
| 3 | 1 | 0 |102.00| NULL |
| 3 | 2 | 0 | 98.00| NULL |
| 3 | 3 | 0 | 96.00| NULL |
| 3 | 4 | 0 | 93.00| NULL |
| 3 | 5 | 0 | 91.00| NULL |
| 3 | 6 | 0 | 89.00| NULL |
| 3 |DNF| 0 | 85.00| NULL |
I was hoping to create a function that returned the points from the three input variables.
points_id, P, N.
Below is what I tried.
CREATE FUNCTION POINTS(pid INT,pin VARCHAR(3),nin INT)
RETURNS DEC(6,2)
DETERMINISTIC
BEGIN
DECLARE pts DECIMAL(6,2);
DECLARE pcalc VARCHAR(20);
SELECT points,pts_calc INTO pts,pcalc FROM scoring_points WHERE points_id=pid AND (P=pin OR P='0') AND (N=nin or N=0);
IF(pts IS NULL) THEN
SET #s= CONCAT('SET pts = ',pcalc);
PREPARE stmt FROM #s;
EXECUTE stmt;
END IF;
RETURN pts;
END
But i got this error.
1336 - Dynamic SQL is not allowed in stored function or trigger
Further research show the Prepare statement is not allowed in functions only but procedures.
I was hoping to do something like;
SELECT SUM(Points(pid,place,numb)) FROM t1 GROUP BY racer.id
But onto plan B (tbd) unless someone has great idea.
I think you might fare better having three numeric columns instead of your pts_calc column:
cPIN - coefficient of pin term
cNIN - coefficient of nin term
cnst - constant term
Your function could then perform:
SELECT IFNULL(points, cPIN*pin + cNIN*nin + cnst) INTO pts
FROM scoring_points
WHERE ...
Depending on your needs, you might even be able to get rid of the points column by just using cnst and leaving the other two equal to 0.