Update connected components in a MySQL table - mysql

Suppose I have a MySQL table that defines a collection of things, each of which is associated with either 1 or 2 owners. For example:
CREATE TABLE thing (
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT
, name CHAR(10)
, first_owner INT UNSIGNED NOT NULL
, second_owner INT UNSIGNED DEFAULT NULL
);
+----+------------+-------------+--------------+
| id | name | first_owner | second_owner |
+----+------------+-------------+--------------+
| 1 | skateboard | Joe | NULL |
| 2 | flashlight | Joe | NULL |
| 3 | drill | Joe | Erica |
| 4 | computer | Erica | NULL |
| 5 | textbook | Diane | NULL |
| 6 | cell phone | Amy | Diane |
| 7 | piano | Paul | Amy |
+----+------------+-------------+--------------+
Each distinct owner is a node of a graph, and two owners in the same row constitute an edge between their nodes. A graph drawn from the above example rows looks like this:
In this example, there are two components: Joe and Erica are one; Diane, Paul and Amy are the other.
I want to identify these components in my table, so I add another column:
ALTER TABLE thing ADD COLUMN `group` INT UNSIGNED;
How could I write an UPDATE statement that would populate this new column by uniquely identifying the connected component to which the row belongs? Here's an example of an acceptable result for the above example rows:
+----+------------+-------------+--------------+-------+
| id | name | first_owner | second_owner | group |
+----+------------+-------------+--------------+-------+
| 1 | skateboard | Joe | NULL | 1 |
| 2 | flashlight | Joe | NULL | 1 |
| 3 | drill | Joe | Erica | 1 |
| 4 | computer | Erica | NULL | 1 |
| 5 | textbook | Diane | NULL | 2 |
| 6 | cell phone | Amy | Diane | 2 |
| 7 | piano | Paul | Amy | 2 |
+----+------------+-------------+--------------+-------+
I could do this with a stored procedure, but my actual scenario involves more tables and millions of rows, so I'm hoping there's a clever way to do this without looping through cursors for a week.
This is a simplified example for the purpose of illustrating the problem. Each component is supposed to represent a "household" and most will have only 1 or 2 nodes, but those with more nodes are especially important. There isn't necessarily any strict upper limit to the size of a household.

You can consider this method of creating hierarchical queries in mysql
CREATE FUNCTION hierarchy_connect_by_parent_eq_prior_id(value INT) RETURNS INT
NOT DETERMINISTIC
READS SQL DATA
BEGIN
DECLARE _id INT;
DECLARE _parent INT;
DECLARE _next INT;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET #id = NULL;
SET _parent = #id;
SET _id = -1;
IF #id IS NULL THEN
RETURN NULL;
END IF;
LOOP
SELECT MIN(id)
INTO #id
FROM t_hierarchy
WHERE parent = _parent
AND id > _id;
IF #id IS NOT NULL OR _parent = #start_with THEN
SET #level = #level + 1;
RETURN #id;
END IF;
SET #level := #level - 1;
SELECT id, parent
INTO _id, _parent
FROM t_hierarchy
WHERE id = _parent;
END LOOP;
END
Also, a very good article on this topic Adjacency list vs. nested sets: MySQL

A very good answer to a related question
"What is the most efficient/elegant way to parse a flat table into a
tree?"
There are several ways to store tree-structured data in a relational
database. What you show in your example uses two methods:
Adjacency List (the "parent" column) and
Path Enumeration (the dotted-numbers in your name column).
Another solution is called Nested Sets, and it can be stored in the
same table too. Read "Trees and Hierarchies in SQL for Smarties" by
Joe Celko for a lot more information on these designs.
I usually prefer a design called Closure Table (aka "Adjacency
Relation") for storing tree-structured data. It requires another
table, but then querying trees is pretty easy.
Please have a look at original question for reference.

Related

Can you use "knight moves" to query a single database and join it to itself?

I am working on a project that involves code in both Prolog and SQL to solve the same problem. A problem I've run across is I can't use a single database to form a hierarchy. In this list of prolog facts you can see that the "assembly" parts are related to each other.
basicpart(spoke).
basicpart(rearframe).
basicpart(handles).
basicpart(gears).
basicpart(bolt).
basicpart(nut).
basicpart(fork).
assembly(bike,[wheel,wheel,frame]).
assembly(wheel,[spoke,rim,hub]).
assembly(frame,[rearframe,frontframe]).
assembly(frontframe,[fork,handles]).
assembly(hub,[gears,axle]).
assembly(axle,[bolt,nut]).
If I put all of these "assembly" definitions into one SQL database, can I use knight moves (joining a table to itself on 2 different columns in it) to build this hierarchy in SQL in only 2 tables?
If I understand that question correctly. You cannot construct your bike with just one query (I'm not familiar with the term "knight moves"). In fact, you can -- but it must be a recursive SQL query. Because you will be computing the transitive closure of the part-subpart relationship.
Unfortunately I don't immediately know how to write these. SQL syntax is frankly abysmal and recursive SQL looks even abysmaller, so below is example code using a loop instead.
You actually need only one table to represent the data as the basicpart/1 relation does not bring anything to the table, except label certain "things" as basic. But these are also the things that do not appear in assembly/2 in the first position.
Notes:
Not using ENUMS which are not really "types" in MySQL/MariaDB but just a constraint on a field of a specific table. (Like, WTF!)
The multiset representation of the Prolog code ("a bike has two wheels") is flattened into multiple rows separately identified by a numeric surrogate id. This is due to the "First Normal Form" dogma of RDBMS practice. There is à priori nothing wrong with having multisets as values, if the query language and the RDBMS engine can support it. For example, you can have XML values in PostgreSQL complete with queries over its content, as I remember1.
DELIMITER //
DROP PROCEDURE IF EXISTS prepare;
CREATE PROCEDURE prepare()
BEGIN
DROP TABLE IF EXISTS assembly;
CREATE TABLE assembly
(id INT AUTO_INCREMENT KEY, -- surrogate key because a bike may have several wheels
part VARCHAR(10) NOT NULL,
subpart VARCHAR(10) NOT NULL);
INSERT INTO assembly(part,subpart) VALUES
("bike","wheel"),
("bike","wheel"),
("bike","frame"),
("wheel","spoke"),
("wheel","rim"),
("wheel","hub"),
("frame","rearframe"),
("frame","frontframe"),
("frontframe","fork"),
("frontframe","handles"),
("hub","gears"),
("hub","axle"),
("axle","bolt"),
("axle","nut");
END;
DROP PROCEDURE IF EXISTS compute_transitive_closure;
CREATE PROCEDURE compute_transitive_closure()
BEGIN
DROP TABLE IF EXISTS pieces;
CREATE TABLE pieces
(id INT AUTO_INCREMENT KEY,
part VARCHAR(10) NOT NULL,
subpart VARCHAR(10) NOT NULL,
path VARCHAR(500) NOT NULl DEFAULT "",
depth INT NOT NULL DEFAULT 0);
INSERT INTO pieces(part,subpart,path,depth) VALUES
("ROOT","bike","/bike",0);
SET #depth=0;
l: LOOP
INSERT INTO pieces(part,subpart,path,depth)
SELECT
p.subpart,
a.subpart,
CONCAT(p.path,'/',a.subpart),
#depth+1
FROM
pieces p,
assembly a
WHERE
p.depth = #depth AND p.subpart = a.part;
IF ROW_COUNT() <= 0 THEN
LEAVE l;
ELSE
SELECT * FROM pieces;
END IF;
SET #depth=#depth+1;
END LOOP;
END; //
DELIMITER ;
Put the above into a file SQL.txt, and then, in a database testme:
MariaDB [testme]> source SQL.txt;
MariaDB [testme]> CALL prepare;
MariaDB [testme]> CALL compute_transitive_closure;
Then after 4 passages through the loop, you get:
+----+------------+------------+--------------------------------+-------+
| id | part | subpart | path | depth |
+----+------------+------------+--------------------------------+-------+
| 1 | ROOT | bike | /bike | 0 |
| 2 | bike | wheel | /bike/wheel | 1 |
| 3 | bike | wheel | /bike/wheel | 1 |
| 4 | bike | frame | /bike/frame | 1 |
| 5 | wheel | spoke | /bike/wheel/spoke | 2 |
| 6 | wheel | spoke | /bike/wheel/spoke | 2 |
| 7 | wheel | rim | /bike/wheel/rim | 2 |
| 8 | wheel | rim | /bike/wheel/rim | 2 |
| 9 | wheel | hub | /bike/wheel/hub | 2 |
| 10 | wheel | hub | /bike/wheel/hub | 2 |
| 11 | frame | rearframe | /bike/frame/rearframe | 2 |
| 12 | frame | frontframe | /bike/frame/frontframe | 2 |
| 20 | frontframe | fork | /bike/frame/frontframe/fork | 3 |
| 21 | frontframe | handles | /bike/frame/frontframe/handles | 3 |
| 22 | hub | gears | /bike/wheel/hub/gears | 3 |
| 23 | hub | gears | /bike/wheel/hub/gears | 3 |
| 24 | hub | axle | /bike/wheel/hub/axle | 3 |
| 25 | hub | axle | /bike/wheel/hub/axle | 3 |
| 27 | axle | bolt | /bike/wheel/hub/axle/bolt | 4 |
| 28 | axle | nut | /bike/wheel/hub/axle/nut | 4 |
| 29 | axle | bolt | /bike/wheel/hub/axle/bolt | 4 |
| 30 | axle | nut | /bike/wheel/hub/axle/nut | 4 |
+----+------------+------------+--------------------------------+-------+
1: This made me dig out "Database in Depth: Relational Theory for Practitioners", O'Reilly 2005, by Chris Date, an excellent introduction to the relational model. On page 30, Date considers "sets as values" (but does not consider "multisets"):
Second (and regardless of what you might think of my first argument),
the fact is that a set like {P2,P4,P5} is no more and no less
decomposable by the DBMS than a character string is. Like character
strings, sets do have some inner structure; as with characters
strings, however, it's convenient to ignore that structure for certain
purposes. In other words, if a character string is compatible with the
requirements of 1NF - that is, if character strings are atomic - then
sets must be, too. The real point I'm getting at here is that the
notion of atomicity has no absolute meaning; it just depends on what
we want to do with the data. Sometimes we want to deal with an entire
set of part numbers as a single thing, and sometimes we want to deal
with individual part numbers within that set - but then we are
descending to a lower level of detail (a lower level of abstraction).

MySQL second auto increment field based on foreign key

I've come across this problem numerous times but haven't found a "MySQL way" to solve the issue as such - I have a database that contains users and reports. Each report has an id which I display as a report number to my users.
The main complaint is that users are confused as to why reports have gone missing from their system. This is not actually the case. It is actually that they are recognizing a gap between their IDs and assume that these are missing reports, when in actual fact, it is simply becasue another user has filled in this auto-incrementing gap.
I need to know if there is a way to do this in MySQL:
Is it possible that I can have a second auto-increment field called report_number which is based on a user_id field which has a different set of auto-increments per user?
e.g.
|------|---------|---------------|
| id | user_id | report_number |
|------|---------|---------------|
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 2 | 1 |
| 5 | 1 | 4 |
| 6 | 1 | 5 |
| 7 | 2 | 2 |
| 8 | 3 | 1 |
| 9 | 3 | 2 |
|------|---------|---------------|
I am using InnoDB for this as it is quite heavily weighted with foreign-keys. It appears to complain when I add a second auto increment field, but I wasn't sure if there was a different way to do this?
MyISAM supports the second column with auto increment, but InnoDB doesn't.
For InnoDB you might create a trigger BEFORE INSERT to get the max value of the reportid and add one to the value.
DELIMITER $$
CREATE TRIGGER report_trigger
BEFORE INSERT ON reports
FOR EACH ROW BEGIN
SET NEW.`report_id` = (SELECT MAX(report_id) + 1 FROM reports WHERE user_id = NEW.user_id);
END $$
DELIMITER ;
If you can use MyISAM instead, in the documentation of MySQL page there is an example:
http://dev.mysql.com/doc/refman/5.0/en/example-auto-increment.html
CREATE TABLE animals (
grp ENUM('fish','mammal','bird') NOT NULL,
id MEDIUMINT NOT NULL AUTO_INCREMENT,
name CHAR(30) NOT NULL,
PRIMARY KEY (grp,id)
) ENGINE=MyISAM;
INSERT INTO animals (grp,name) VALUES
('mammal','dog'),('mammal','cat'),
('bird','penguin'),('fish','lax'),('mammal','whale'),
('bird','ostrich');
SELECT * FROM animals ORDER BY grp,id;
Which returns:
+--------+----+---------+
| grp | id | name |
+--------+----+---------+
| fish | 1 | lax |
| mammal | 1 | dog |
| mammal | 2 | cat |
| mammal | 3 | whale |
| bird | 1 | penguin |
| bird | 2 | ostrich |
+--------+----+---------+
Right one with IFNULL:
DELIMITER $$
CREATE TRIGGER salons_trigger
BEFORE INSERT ON salon
FOR EACH ROW BEGIN
SET NEW.salon_id = IFNULL((SELECT MAX(salon_id) + 1 FROM salon WHERE owner = NEW.owner), 1);
END $$
DELIMITER ;
I think mysql doesnt support two auto_increment columns. you can create report number using information schema.
select NULL from information_schema.columns
MySQl does not support two auto incremented fields, if you need then create another table, set the other field which you want to be as auto incremented and you must set up a relationship with these two tables.

Search contacts upto multiple levels [duplicate]

I have a database with a tree of names that can go down a total of 9 levels deep and I need to be able to search down a signal branch of the tree from any point on the branch.
Database:
+----------------------+
| id | name | parent |
+----------------------+
| 1 | tom | 0 |
| 2 | bob | 0 |
| 3 | fred | 1 |
| 4 | tim | 2 |
| 5 | leo | 4 |
| 6 | sam | 4 |
| 7 | joe | 6 |
| 8 | jay | 3 |
| 9 | jim | 5 |
+----------------------+
Tree:
tom
fred
jay
bob
tim
sam
joe
leo
jim
For example:
If I search "j" from the user "bob" I should get only "joe" and "jim". If I search "j" form "leo" I should only get "jim".
I can't think of any easy way do to this so any help is appreciated.
You should really consider using the Modified Preorder Tree Traversal which makes such queries much easier. Here's your table expressed with MPTT. I have left the parent field, as it makes some queries easier.
+----------------------+-----+------+
| id | name | parent | lft | rght |
+----------------------+-----+------+
| 1 | tom | 0 | 1 | 6 |
| 2 | bob | 0 | 7 | 18 |
| 3 | fred | 1 | 2 | 5 |
| 4 | tim | 2 | 8 | 17 |
| 5 | leo | 4 | 12 | 15 |
| 6 | sam | 4 | 9 | 16 |
| 7 | joe | 6 | 10 | 11 |
| 8 | jay | 3 | 3 | 4 |
| 9 | jim | 5 | 13 | 14 |
+----------------------+-----+------+
To search j from user bob you'd use the lft and rght values for bob:
SELECT * FROM table WHERE name LIKE 'j%' AND lft > 7 AND rght < 18
Implementing the logic to update lft and rght for adding, removing and reordering nodes can be a challenge (hint: use an existing library if you can) but querying will be a breeze.
There isn't a nice/easy way of doing this; databases don't support tree-style data structures well.
You will need to work on a level-by-level basis to prune results from child-to-parent, or create a view that gives all 9 generations from a given node, and match using an OR on the descendants.
Have you thought about using a recursive loop? i use a loop for a cms i built on top of codeigniter that allows me to start anywhere in the site tree and will then subsequently filter trhough all the children> grand children > great grand children etc. Plus it keeps the sql down to short rapid queries opposed to lots of complicated joins. It may need some modifying in your case but i think it could work.
/**
* build_site_tree
*
* #return void
* #author Mike Waites
**/
public function build_site_tree($parent_id)
{
return $this->find_children($parent_id);
}
/** end build_site_tree **/
// -----------------------------------------------------------------------
/**
* find_children
* Recursive loop to find parent=>child relationships
*
* #return array $children
* #author Mike Waites
**/
public function find_children($parent_id)
{
$this->benchmark->mark('find_children_start');
if(!class_exists('Account_model'))
$this->load->model('Account_model');
$children = $this->Account_model->get_children($parent_id);
/** Recursively Loop over the results to build the site tree **/
foreach($children as $key => $child)
{
$childs = $this->find_children($child['id']);
if (count($childs) > 0)
$children[$key]['children'] = $childs;
}
return $children;
$this->benchmark->mark('find_children_end');
}
/** end find_children **/
As you can see this is a pretty simplfied version and bear in mind this has been built into codeigniter so you will need to modyfy it to suite but basically we have a loop that calls itself adding to an array each time as it goes. This will allow you to get the whole tree, or even start from a point in the tree as long as you have the parent_id avaialble first!
Hope this helps
The new "recursive with" construct will do the job, but I don't know id MySQL supports it (yet).
with recursive bobs(id) as (
select id from t where name = 'bob'
union all
select t.id from t, bobs where t.parent_id = bobs.id
)
select t.name from t, bobs where t.id = bobs.id
and name like 'j%'
There is no single SQL query that will return the data in tree format - you need processing to traverse it in the right order.
One way is to query MySQL to return MPTT:
SELECT * FROM table ORDER BY parent asc;
root of the tree will be the first item of the table, its children will be next, etc., the tree being listed "breadth first" (in layers of increasing depth)
Then use PHP to process the data, turning it into an object that holds the data structure.
Alternatively, you could implement MySQL search functions that given a node, recursively search and return a table of all its descendants, or a table of all its ancestors. As these procedures tend to be slow (being recursive, returning too much data that is then filtered by other criteria), you want to only do this if you know you're not querying for that kind of data again and again, or if you know that the data set remains small (9 levels deep and how wide?)
You can do this with a stored procedure as follows:
Example calls
mysql> call names_hier(1, 'a');
+----+----------+--------+-------------+-------+
| id | emp_name | parent | parent_name | depth |
+----+----------+--------+-------------+-------+
| 2 | ali | 1 | f00 | 1 |
| 8 | anna | 6 | keira | 4 |
+----+----------+--------+-------------+-------+
2 rows in set (0.00 sec)
mysql> call names_hier(3, 'k');
+----+----------+--------+-------------+-------+
| id | emp_name | parent | parent_name | depth |
+----+----------+--------+-------------+-------+
| 6 | keira | 5 | eva | 2 |
+----+----------+--------+-------------+-------+
1 row in set (0.00 sec)
$sqlCmd = sprintf("call names_hier(%d,'%s')", $id, $name); // dont forget to escape $name
$result = $db->query($sqlCmd);
Full script
drop table if exists names;
create table names
(
id smallint unsigned not null auto_increment primary key,
name varchar(255) not null,
parent smallint unsigned null,
key (parent)
)
engine = innodb;
insert into names (name, parent) values
('f00',null),
('ali',1),
('megan',1),
('jessica',3),
('eva',3),
('keira',5),
('mandy',6),
('anna',6);
drop procedure if exists names_hier;
delimiter #
create procedure names_hier
(
in p_id smallint unsigned,
in p_name varchar(255)
)
begin
declare v_done tinyint unsigned default(0);
declare v_dpth smallint unsigned default(0);
set p_name = trim(replace(p_name,'%',''));
create temporary table hier(
parent smallint unsigned,
id smallint unsigned,
depth smallint unsigned
)engine = memory;
insert into hier select parent, id, v_dpth from names where id = p_id;
/* http://dev.mysql.com/doc/refman/5.0/en/temporary-table-problems.html */
create temporary table tmp engine=memory select * from hier;
while not v_done do
if exists( select 1 from names n inner join tmp on n.parent = tmp.id and tmp.depth = v_dpth) then
insert into hier select n.parent, n.id, v_dpth + 1
from names n inner join tmp on n.parent = tmp.id and tmp.depth = v_dpth;
set v_dpth = v_dpth + 1;
truncate table tmp;
insert into tmp select * from hier where depth = v_dpth;
else
set v_done = 1;
end if;
end while;
select
n.id,
n.name as emp_name,
p.id as parent,
p.name as parent_name,
hier.depth
from
hier
inner join names n on hier.id = n.id
left outer join names p on hier.parent = p.id
where
n.name like concat(p_name, '%');
drop temporary table if exists hier;
drop temporary table if exists tmp;
end #
delimiter ;
-- call this sproc from your php
call names_hier(1, 'a');
call names_hier(3, 'k');

MySql: ORDER BY parent and child

I have a table like:
+------+---------+-
| id | parent |
+------+---------+
| 2043 | NULL |
| 2044 | 2043 |
| 2045 | 2043 |
| 2049 | 2043 |
| 2047 | NULL |
| 2048 | 2047 |
| 2049 | 2047 |
+------+---------+
which shows a simple, 2-level "parent-child"-corelation. How can I ORDER BY an SELECT-statement to get the order like in the list above, which means: 1st parent, childs of 1st parent, 2nd parent, childs of 2nd parent and so on (if I have that, I can add the ORDER BYs for the children... I hope). Is it possible withoug adding a sort-field?
Including sorting children by id:
ORDER BY COALESCE(parent, id), parent IS NOT NULL, id
SQL Fiddle example
Explanation:
COALESCE(parent, id): First sort by (effectively grouping together) the parent's id.
parent IS NOT NULL: Put the parent row on top of the group
id: Finally sort all the children (same parent, and parent is not null)
If your table uses 0 instead of null to indicate an entry with no parent:
id | parent
-------------
1233 | 0
1234 | 1233
1235 | 0
1236 | 1233
1237 | 1235
Use greatest instead of coalesce and check the value does not equal 0:
ORDER BY GREATEST(parent, id), parent != 0, id
The solution above didn't work for me, my table used 0 instead of NULL.
I found this other solution: you create a column with the concatened parent id and child id in your query and you can sort the result by it .
SELECT CONCAT(IF(parent = 0,'',CONCAT('/',parent)),'/',id) AS gen_order
FROM table
ORDER BY gen_order
This question still shows as one of the first search results. So I would like to share a my solution and hope it will help more people out. This will also work when you have a table with many levels of parent and child relations. Although it is quite a slow solution. The top level has NULL as parent.
+---------+---------+
| id | parent |
+---------+---------+
| 1 | NULL |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
+---------+---------+
In my approach I will use a procedure that will recursively call itself and keep prepending the path with the parent of the requested id until it reaches the NULL parent.
DELIMITER $$
CREATE DEFINER=`root`#`localhost` PROCEDURE `PATH`(IN `input` INT, OUT `output` VARCHAR(128))
BEGIN
DECLARE _id INT;
DECLARE _parent INT;
DECLARE _path VARCHAR(128);
SET `max_sp_recursion_depth` = 50;
SELECT `id`, `parent`
INTO _id, _parent
FROM `database`.`table`
WHERE `table`.`id` = `input`;
IF _parent IS NULL THEN
SET _path = _id;
ELSE
CALL `PATH`(_parent, _path);
SELECT CONCAT(_path, '-', _id) INTO _path;
END IF;
SELECT _path INTO `output`;
END $$
DELIMITER ;
To use the results in an ORDER BY clause you will need a FUNCTION too that wraps the results of the PROCEDURE.
DELIMITER $$
CREATE DEFINER=`root`#`localhost` FUNCTION `GETPATH`(`input` INT) RETURNS VARCHAR(128)
BEGIN
CALL `PATH`(`input`, #path);
RETURN #path;
END $$
DELIMITER ;
Now we can use the recursive path to sort the order of the table. On a table with 10000 rows it takes just over a second on my workstation.
SELECT `id`, `parent`, GETPATH(`id`) `path` FROM `database`.`table` ORDER BY `GETPATH`(`id`);
Example output:
+---------+---------+---------------+
| id | parent | path |
+---------+---------+---------------+
| 1 | NULL | 1 |
| 10 | 1 | 1-10 |
| 300 | 10 | 1-10-300 |
| 301 | 300 | 1-10-300-301 |
| 302 | 300 | 1-10-300-302 |
+---------+---------+---------------+
5 rows in set (1,39 sec)

Getting limited amount of records from hierarchical data

Let's say I have 3 tables (significant columns only)
Category (catId key, parentCatId)
Category_Hierarchy (catId key, parentTrail, catLevel)
Product (prodId key, catId, createdOn)
There's a reason for having a separate Category_Hierarchy table, because I'm using triggers on Category table that populate it, because MySql triggers work as they do and I can't populate columns on the same table inside triggers if I would like to use auto_increment values. For the sake of this problem this is irrelevant. These two tables are 1:1 anyway.
Category table could be:
+-------+-------------+
| catId | parentCatId |
+-------+-------------+
| 1 | NULL |
| 2 | 1 |
| 3 | 2 |
| 4 | 3 |
| 5 | 3 |
| 6 | 4 |
| ... | ... |
+-------+-------------+
Category_Hierarchy
+-------+-------------+----------+
| catId | parentTrail | catLevel |
+-------+-------------+----------+
| 1 | 1/ | 0 |
| 2 | 1/2/ | 1 |
| 3 | 1/2/3/ | 2 |
| 4 | 1/2/3/4/ | 3 |
| 5 | 1/2/3/5/ | 3 |
| 6 | 1/2/3/4/6/ | 4 |
| ... | ... | ... |
+-------+-------------+----------+
Product
+--------+-------+---------------------+
| prodId | catId | createdOn |
+--------+-------+---------------------+
| 1 | 4 | 2010-02-03 12:09:24 |
| 2 | 4 | 2010-02-03 12:09:29 |
| 3 | 3 | 2010-02-03 12:09:36 |
| 4 | 1 | 2010-02-03 12:09:39 |
| 5 | 3 | 2010-02-03 12:09:50 |
| ... | ... | ... |
+--------+-------+---------------------+
Category_Hierarchy makes it simple to get category subordinate trees like this:
select c.*
from Category c
join Category_Hierarchy h
on (h.catId = c.catId)
where h.parentTrail like '1/2/3/%'
Which would return complete subordinate tree of category 3 (that is below 2, that is below 1 which is root category) including subordinate tree root node. Excluding root node is just one more where condition.
The problem
I would like to write a stored procedure:
create procedure GetLatestProductsFromSubCategories(in catId int)
begin
/* return 10 latest products from each */
/* catId subcategory subordinate tree */
end;
This means if a certain category had 3 direct sub categories (with whatever number of nodes underneath) I would get 30 results (10 from each subordinate tree). If it had 5 sub categories I'd get 50 results.
What would be the best/fastest/most efficient way to do this? If possible I'd like to avoid cursors unless they'd work faster compared to any other solution as well as prepared statements, because this would be one of the most frequent calls to DB.
Edit
Since a picture tells 1000 words I'll try to better explain what I want using an image. Below image shows category tree. Each of these nodes can have an arbitrary number of products related to them. Products are not included in the picture.
So if I'd execute this call:
call GetLatestProductsFromSubCategories(1);
I'd like to effectively get 30 products:
10 latest products from the whole orange subtree
10 latest products from the whole blue subtree and
10 latest products from the whole green subtree
I don't want to get 10 latest products from each node under catId=1 node which would mean 320 products.
Final Solution
This solution has O(n) performance:
CREATE PROCEDURE foo(IN in_catId INT)
BEGIN
DECLARE done BOOLEAN DEFAULT FALSE;
DECLARE first_iteration BOOLEAN DEFAULT TRUE;
DECLARE current VARCHAR(255);
DECLARE categories CURSOR FOR
SELECT parentTrail
FROM category
JOIN category_hierarchy USING (catId)
WHERE parentCatId = in_catId;
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done = TRUE;
SET #query := '';
OPEN categories;
category_loop: LOOP
FETCH categories INTO current;
IF `done` THEN LEAVE category_loop; END IF;
IF first_iteration = TRUE THEN
SET first_iteration = FALSE;
ELSE
SET #query = CONCAT(#query, " UNION ALL ");
END IF;
SET #query = CONCAT(#query, "(SELECT product.* FROM product JOIN category_hierarchy USING (catId) WHERE parentTrail LIKE CONCAT('",current,"','%') ORDER BY createdOn DESC LIMIT 10)");
END LOOP category_loop;
CLOSE categories;
IF #query <> '' THEN
PREPARE stmt FROM #query;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END IF;
END
Edit
Due to the latest clarification, this solution was simply edited to simplify the categories cursor query.
Note: Make the VARCHAR on line 5 the appropriate size based on your parentTrail column.