Optimizing tree branch data aggregation in SQL Server 2008 (recursion)

Optimizing tree branch data aggregation in SQL Server 2008 (recursion) - sql-server-2008

I have a table containing stages and sub-stages of certain projects, and a table with specific tasks and estimated costs.
I need some way to aggregate each level (stages/sub-stages), to see how much it costs, but to do it at a minimum performance cost.
To illustrate this, I will use the following data structure:
CREATE TABLE stage
(
id int not null,
fk_parent int
)
CREATE TABLE task
(
id int not null,
fk_stage int not null,
cost decimal(18,2) not null default 0
)
with the following data:
==stage==
id fk_parent
1 null
2 1
3 1
==task==
id fk_stage cost
1 2 100
1 2 200
1 3 600
I want to obtain a table containing the total costs on each branch. Something like this:
Stage ID Total Cost
1 900
2 300
3 600
But, I also want it to be productive. I don't want to end up with extremely bad solutions like The worst algorithm in the world. I mean this is the case. In case I'll request the data for all the items in the stage table, with the total costs, each total cost will be evaluated D times, where D is the depth in the tree (level) at which it is situated. I am afraid I'll hit extremely low performances at large amounts of data with a lot of levels.
SO,
I decided to do something which made me ask this question here.
I decided to add 2 more columns to the stage table, for caching.
...
calculated_cost decimal(18,2),
date_calculated_cost datetime
...
So what I wanted to do is pass another variable within the code, a datetime value which equals to the time when this process was started (pretty much unique). That way, if the stage row already has a date_calculated_cost which equals to the one I'm carrying, I don't bother calculating it again, and just return the calculated_cost value.
I couldn't do it with Functions (updates are needed to the stage table, once costs are calculated)
I couldn't do it with Procedures (recursion within running cursors is a no-go)
I am not sure temporary tables are suitable because it wouldn't allow concurrent requests to the same procedure (which are least likely, but anyway I want to do it the right way)
I couldn't figure out other ways.
I am not expecting a definite answer to my question, but I will reward any good idea, and the best will be chosen as the answer.

1. A way to query the tables to get the aggregated cost.
Calculate the cost for each stage.
Use a recursive CTE to get the level for each stage.
Store the result in a temp table.
Add a couple of indexes to the temp table.
Update the cost in the temp table in a loop for each level
The first three steps is combined to one statement. It might be good for performance to do the first calculation, cteCost, to a temp table of it's own and use that temp table in the recursive cteLevel.
;with cteCost as
(
select s.id,
s.fk_parent,
isnull(sum(t.cost), 0) as cost
from stage as s
left outer join task as t
on s.id = t.fk_stage
group by s.id, s.fk_parent
),
cteLevel as
(
select cc.id,
cc.fk_parent,
cc.cost,
1 as lvl
from cteCost as cc
where cc.fk_parent is null
union all
select cc.id,
cc.fk_parent,
cc.cost,
lvl+1
from cteCost as cc
inner join cteLevel as cl
on cc.fk_parent = cl.id
)
select *
into #task
from cteLevel
create clustered index IX_id on #task (id)
create index IX_lvl on #task (lvl, fk_parent)
declare #lvl int
select #lvl = max(lvl)
from #task
while #lvl > 0
begin
update T1 set
T1.cost = T1.cost + T2.cost
from #task as T1
inner join (select fk_parent, sum(cost) as cost
from #task
where lvl = #lvl
group by fk_parent) as T2
on T1.id = T2.fk_parent
set #lvl = #lvl - 1
end
select id as [Stage ID],
cost as [Total Cost]
from #task
drop table #task
2. A trigger on table task that maintains a calculated_cost field in stage.
create trigger tr_task
on task
after insert, update, delete
as
-- Table to hold the updates
declare #T table
(
id int not null,
cost decimal(18,2) not null default 0
)
-- Get the updates from inserted and deleted tables
insert into #T (id, cost)
select fk_stage, sum(cost)
from (
select fk_stage, cost
from inserted
union all
select fk_stage, -cost
from deleted
) as T
group by fk_stage
declare #id int
select #id = min(id)
from #T
-- For each updated row
while #id is not null
begin
-- Recursive update of stage
with cte as
(
select s.id,
s.fk_parent
from stage as s
where id = #id
union all
select s.id,
s.fk_parent
from stage as s
inner join cte as c
on s.id = c.fk_parent
)
update s set
calculated_cost = s.calculated_cost + t.cost
from stage as s
inner join cte as c
on s.id = c.id
cross apply (select cost
from #T
where id = #id) as t
-- Get the next id
select #id = min(id)
from #T
where id > #id
end

Related

Table is specified twice, both as a target for 'UPDATE' and as a separate source for data in mysql

I have below query in mysql where I want to check if branch id and year of finance type from branch_master are equal with branch id and year of manager then update status in manager table against branch id in manager
UPDATE manager as m1
SET m1.status = 'Y'
WHERE m1.branch_id IN (
SELECT m2.branch_id FROM manager as m2
WHERE (m2.branch_id,m2.year) IN (
(
SELECT DISTINCT branch_id,year
FROM `branch_master`
WHERE type = 'finance'
)
)
)
but getting error
Table 'm1' is specified twice, both as a target for 'UPDATE' and as a
separate source for data

This is a typical MySQL thing and can usually be circumvented by selecting from the table derived, i.e. instead of
FROM manager AS m2
use
FROM (select * from manager) AS m2
The complete statement:
UPDATE manager
SET status = 'Y'
WHERE branch_id IN
(
select branch_id
FROM (select * from manager) AS m2
WHERE (branch_id, year) IN
(
SELECT branch_id, year
FROM branch_master
WHERE type = 'finance'
)
);

The correct answer is in this SO post.
The problem with here accepted answer is - as was already mentioned multiple times - creating a full copy of the whole table. This is way far from optimal and the most space complex one. The idea is to materialize the subset of data used for update only, so in your case it would be like this:
UPDATE manager as m1
SET m1.status = 'Y'
WHERE m1.branch_id IN (
SELECT * FROM(
SELECT m2.branch_id FROM manager as m2
WHERE (m2.branch_id,m2.year) IN (
SELECT DISTINCT branch_id,year
FROM `branch_master`
WHERE type = 'finance')
) t
)
Basically you just encapsulate your previous source for data query inside of
SELECT * FROM (...) t

Try to use the EXISTS operator:
UPDATE manager as m1
SET m1.status = 'Y'
WHERE EXISTS (SELECT 1
FROM (SELECT m2.branch_id
FROM branch_master AS bm
JOIN manager AS m2
WHERE bm.type = 'finance' AND
bm.branch_id = m2.branch_id AND
bm.year = m2.year) AS t
WHERE t.branch_id = m1.branch_id);
Note: The query uses an additional nesting level, as proposed by #Thorsten, as a means to circumvent the Table is specified twice error.
Demo here

Try :::
UPDATE manager as m1
SET m1.status = 'Y'
WHERE m1.branch_id IN (
(SELECT DISTINCT branch_id
FROM branch_master
WHERE type = 'finance'))
AND m1.year IN ((SELECT DISTINCT year
FROM branch_master
WHERE type = 'finance'))

The problem I had with the accepted answer is that create a copy of the whole table, and for me wasn't an option, I tried to execute it but after several hours I had to cancel it.
A very fast way if you have a huge amount of data is create a temporary table:
Create TMP table
CREATE TEMPORARY TABLE tmp_manager
(branch_id bigint auto_increment primary key,
year datetime null);
Populate TMP table
insert into tmp_manager (branch_id, year)
select branch_id, year
from manager;
Update with join
UPDATE manager as m, tmp_manager as tmp_m
inner JOIN manager as man on tmp_m.branch_id = man.branch_id
SET status = 'Y'
WHERE m.branch_id = tmp_m.branch_id and m.year = tmp_m.year and m.type = 'finance';

This is by far the fastest way:
UPDATE manager m
INNER JOIN branch_master b on m.branch_id=b.branch_id AND m.year=b.year
SET m.status='Y'
WHERE b.type='finance'
Note that if it is a 1:n relationship the SET command will be run more than once. In this case that is no problem. But if you have something like "SET price=price+5" you cannot use this construction.

Maybe not a solution, but some thoughts about why it doesn't work in the first place:
Reading data from a table and also writing data into that same table is somewhat an ill-defined task. In what order should the data be read and written? Should newly written data be considered when reading it back from the same table? MySQL refusing to execute this isn't just because of a limitation, it's because it's not a well-defined task.
The solutions involving SELECT ... FROM (SELECT * FROM table) AS tmp just dump the entire content of a table into a temporary table, which can then be used in any further outer queries, like for example an update query. This forces the order of operations to be: Select everything first into a temporary table and then use that data (instead of the data from the original table) to do the updates.
However if the table involved is large, then this temporary copying is going to be incredibly slow. No indexes will ever speed up SELECT * FROM table.
I might have a slow day today... but isn't the original query identical to this one, which souldn't have any problems?
UPDATE manager as m1
SET m1.status = 'Y'
WHERE (m1.branch_id, m1.year) IN (
SELECT DISTINCT branch_id,year
FROM `branch_master`
WHERE type = 'finance'
)

SQL - Column in field list is ambiguous

I have two tables BOOKINGS and WORKER. Basically there is table for a worker and a table to keep track of what the worker has to do in a time frame aka booking. I’m trying to check if there is an available worker for a job, so I query the booking to check if requested time has available workers between the start end date. However, I get stuck on the next part. Which is returning the list of workers that do have that time available. I read that I could join the table passed on a shared column, so I tried doing an inner join with the WORKER_NAME column, but when I try to do this I get a ambiguous error. This leads me to believe I misunderstood the concept. Does anyone understand what I;m trying to do and knows how to do it, or knows why I have the error below. Thanks guys !!!!
CREATE TABLE WORKER (
ID INT NOT NULL AUTO_INCREMENT,
WORKER_NAME varchar(80) NOT NULL,
WORKER_CODE INT,
WORKER_WAGE INT,
PRIMARY KEY (ID)
)
CREATE TABLE BOOKING (
ID INT NOT NULL AUTO_INCREMENT,
WORKER_NAME varchar(80) NOT NULL,
START DATE NOT NULL,
END DATE NOT NULL,
PRIMARY KEY (ID)
)
query
SELECT *
FROM WORKERS
INNER JOIN BOOKING
ON WORKER_NAME = WORKER_NAME
WHERE (START NOT BETWEEN '2010-10-01' AND '2010-10-10')
ORDER BY ID
#1052 - Column 'WORKER_NAME' in on clause is ambiguous

In your query, the column "worker_name" exists in two tables; in this case, you must reference the tablename as part of the column identifer.
SELECT *
FROM WORKERS
INNER JOIN BOOKING
ON workers.WORKER_NAME = booking.WORKER_NAME
WHERE (START NOT BETWEEN '2010-10-01' AND '2010-10-10')
ORDER BY ID

In your query, the column WORKER_NAME and ID columns exists in both tables, where WORKER_NAME retains the same meaning and ID is re-purposed; in this case, you must either specify you are using WORKER_NAME as the join search condition or 'project away' (rename or omit) the duplicate ID problem.
Because the ID columns are AUTO_INCREMENT, I assume (hope!) they have no business meaning. Therefore, they could both be omitted, allowing a natural join that will cause duplicate columns to be 'projected away'. This is one of those situations where one wishes SQL had a WORKER ( ALL BUT ( ID ) ) type syntax; instead, one is required to do it longhand. It might be easier in the long run to to opt for a consistent naming convention and rename the columns to WORKER_ID and BOOKING_ID respectively.
You would also need to identify a business key to order on e.g. ( START, WORKER_NAME ):
SELECT *
FROM
( SELECT WORKER_NAME, WORKER_CODE, WORKER_WAGE FROM WORKER ) AS W
NATURAL JOIN
( SELECT WORKER_NAME, START, END FROM BOOKING ) AS B
WHERE ( START NOT BETWEEN '2010-10-01' AND '2010-10-10' )
ORDER BY START, WORKER_NAME;
This is good, but its returning the start and end times as well. I'm just wanting the WOKER ROWS. I cant take the start and end out, because then sql doesn’t recognize the where clause.
Two approaches spring to mind: push the where clause to the subquery:
SELECT *
FROM
( SELECT WORKER_NAME, WORKER_CODE, WORKER_WAGE FROM WORKER ) AS W
NATURAL JOIN
( SELECT WORKER_NAME, START, END
FROM BOOKING
WHERE START NOT BETWEEN '2010-10-01' AND '2010-10-10' ) AS B
ORDER BY START, WORKER_NAME;
Alternatively, replace SELECT * with a list of columns you want to SELECT:
SELECT WORKER_NAME, WORKER_CODE, WORKER_WAGE
FROM
( SELECT WORKER_NAME, WORKER_CODE, WORKER_WAGE FROM WORKER ) AS W
NATURAL JOIN
( SELECT WORKER_NAME, START, END FROM BOOKING ) AS B
WHERE START NOT BETWEEN '2010-10-01' AND '2010-10-10'
ORDER BY START, WORKER_NAME;

This error comes after you attempt to call a field which exists in both tables, therefore you should make a reference. For instance in example below I first say cod.coordinator so that DBMS know which coordinator I want
SELECT project__number, surname, firstname,cod.coordinator FROMcoordinatorsAS co JOIN hub_applicants AS ap ON co.project__number = ap.project_id JOIN coordinator_duties AS cod ON co.coordinator = cod.email

indexes don't affect time execution in ms sql 2014 VS mysql (mariaDB 10)

I'm porting statistics analyzer system from MySQL (MariaDB 10) to MS SQL 2014, and I found a strange thing. Normally I used to use single- and multi-field indexes for most operations: statistics database holds about 60 millions of events on 4-core pc, and analysis includes funnels, event segmentation, cohort analysis, KPIs and more, so it may be slow sometimes.
But I was quite surprised when I've executed several query sequences from on MS SQL and then removed all indexes (except the main clastered id): I saw that execution time even decreased! I've restarted server (cache is cleared) but after each restart result was similar - my queries work faster without indexes (actually speed is the same, but no time is spent on manual indexes creation ).
I suppose MS SQL creates implicit indexes for me, but in this case it looks like I should remove all indexes creation from my queries? In MySQL you can clearly see that adding indexes really works. Does this MS SQL behaviour mean that I don't need to care about indexes anymore? I've made several tests with my queries and it seems that indexes almost don't affect execution time. Last time I dealed with MS SQL was a long ago and it was MS SQL 2000, so maybe MSFT developed f**n' AI during last 15 years? :)
Just in case this test sql code (generated by back-end for front-end) is below.
In short it produces graph data for particular type of events for last 3 months over time, then does segmentation by one parameter. It creates temp table from main events table with user set constraints (time period, parameters), creates several more temp tables and indexes, does several joins and returns final select result:
select min(tmstamp), max(tmstamp)
from evt_db.dbo.events
where ( ( source = 3 )
and ( event_id=24 )
and tmstamp > 1451606400
AND tmstamp < 1458000000
);
select min(param1), max(param1), count(DISTINCT(param1))
from evt_db.dbo.events
WHERE ( ( source = 3 )
AND ( event_id=24 )
AND tmstamp > 1451606400
AND tmstamp < 1458000000
);
create table #_tmp_times_calc_analyzer_0_0 (
tm_start int,
tm_end int,
tm_origin int,
tm_num int
);
insert into #_tmp_times_calc_analyzer_0_0 values
( 1451606400, 1452211200, 1451606400, 0 ),
( 1452211200, 1452816000, 1452211200, 1 ),
( 1452816000, 1453420800, 1452816000, 2 ),
( 1453420800, 1454025600, 1453420800, 3 ),
( 1454025600, 1454630400, 1454025600, 4 ),
( 1454630400, 1455235200, 1454630400, 5 ),
( 1455235200, 1455840000, 1455235200, 6 ),
( 1455840000, 1456444800, 1455840000, 7 ),
( 1456444800, 1457049600, 1456444800, 8 ),
( 1457049600, 1457654400, 1457049600, 9 ),
( 1457654400, 1458259200, 1457654400, 10 );
And...
CREATE INDEX tm_num ON _tmp_times_calc_analyzer_0_0 (tm_num);
SELECT id, t1.uid, tmstamp, floor((tmstamp - 1451606400) / 604800) period_num,
param1 into #_tmp_events_view_analyzer_0_0
FROM evt_db.dbo.events t1
WHERE ( ( source = 3 )
AND ( event_id=24 )
AND tmstamp > 1451606400
AND tmstamp < 1458000000
);
CREATE INDEX uid ON _tmp_events_view_analyzer_0_0 (uid);
CREATE INDEX period_num ON _tmp_events_view_analyzer_0_0 (period_num);
CREATE INDEX tmstamp ON _tmp_events_view_analyzer_0_0 (tmstamp);
CREATE INDEX _index_param1 ON _tmp_events_view_analyzer_0_0 (param1);
create table #_tmp_median_analyzer_0_0 (ts int );
insert into #_tmp_median_analyzer_0_0
select distinct(param1) v
from #_tmp_events_view_analyzer_0_0
where param1 is not null
order by v ;
select tm_origin, count(distinct uid), count(distinct id)
from #_tmp_times_calc_analyzer_0_0
left join #_tmp_events_view_analyzer_0_0 ON period_num = tm_num
GROUP BY tm_origin;
select top 600 (param1) seg1, count(distinct uid), count(distinct id)
from #_tmp_events_view_analyzer_0_0
GROUP BY param1
order by 1 asc;
And...
select seg1, tm_origin, count(distinct uid), count(distinct id)
from
( SELECT (param1) seg1, tm_origin, uid, id
from #_tmp_times_calc_analyzer_0_0
left join #_tmp_events_view_analyzer_0_0 ON period_num = tm_num
group by param1, tm_origin, uid, id
) t
GROUP BY seg1, tm_origin;
select min(param1), max(param1), round(avg(param1),0)
from #_tmp_events_view_analyzer_0_0;
DECLARE #c BIGINT = (SELECT COUNT(*) FROM #_tmp_median_analyzer_0_0);
SELECT round(AVG(1.0 * ts),0)
FROM
( SELECT ts
FROM #_tmp_median_analyzer_0_0
ORDER BY ts OFFSET (#c - 1) / 2 ROWS
FETCH NEXT 1 + (1 - #c % 2) ROWS ONLY
) AS median_val;

evt_db.dbo.events needs INDEX(source, event, tmstamp), with tmstamp 3rd. In the case of MySQL, those first 2 SELECTs will run entirely in the index (because it is a "covering" index). source and event can be in either order.
Later, you have a similar SELECT but it also has id, t1.uid. You could make this covering index for it: INDEX(source, event, tmstamp, uid, id). Again, tmstamp must be third in the list.
select top 600 (param1) seg1, count(distinct uid), count(distinct id) ... might benefit from INDEX(param1, uid, id), where param1 must be first.
The other indexes you list are possibly not useful at all. What indexes did you try?
One difference between MySQL and other Databases -- MySQL almost never uses more than one index in a query. And, in my experience, MySQL's choice is 'wise'. Perhaps MSSql is trying too hard to use two indexes, when simply scanning the table would be less work.

Mysql, how to change the following query to fech each row of a table?

I have an event occurring once a day. I have 2 tables:
application
rating
Basically, each application has an avg_score that is given by the average of all the feedbacks given by users that are stored in the table rating in the field score. I wrote an event that once a day refresh this value:
CREATE EVENT MY_DAILY_UPDATE
ON SCHEDULE EVERY 1 DAY STARTS '2011-07-23 23:30:00'
DO
UPDATE application
SET `avg_score`= (SELECT AVG(`score`) as new_score
FROM `rating`
WHERE `ID_APPLICATION` = 1)
WHERE `APPLICATION_ID` = 1
It works, but only for the application with ID = 1, cause i wrote it by myself.
Instead i need my query to update the field avg_score for each application in the table application.
So i think i need to change the value 1 with a variable ID (ex WHERE APPLICATION_ID = ID_VARIABLE).......and this variable should take the id value of each app in the application table (1,2,3.....4 etc).......but i have no idea about how to change my query.....

Change your sub-query to referrence the values in the outer query. (This makes it a correlated sub-query.)
UPDATE application
SET avg_score = (
SELECT AVG(score)
FROM rating
WHERE ID_APPLICATION = application.APPLICATION_ID
)
Alternatively, as you're doing this for "all values", just join on the sub-query...
UPDATE
application
INNER JOIN
(
SELECT ID_APPLICATION, AVG(score) AS score FROM rating GROUP BY ID_APPLICATION
)
AS averages
ON averages.ID_APPLICAITON = application.APPLICATION_ID
SET
application.avg_score = averages.score

UPDATE application
SET `avg_score`=
(SELECT AVG(`score`) as new_score
FROM `rating`
WHERE `ID_APPLICATION` = `application.APPLICATION_ID`)

Handling tree in a MySQL procedure

The idea is simple - I have two tables, categories and products.
Categories:
id | parent_id | name | count
1 NULL Literature 6020
2 1 Interesting books 1000
3 1 Horrible books 5000
4 1 Books to burn 20
5 NULL Motorized vehicles 1000
6 5 Cars 999
7 5 Motorbikes 1
...
Products:
id | category_id | name
1 1 Cooking for dummies
2 3 Twilight saga
3 5 My grandpa's car
...
Now while displayed, the parent category contains all the products of all the children categories. Any category may have children categories. The count field in the table structure contains (or at least I want it to contain) count of all products displayed in this particular category. On the front-end, I select all subcategories with a simple recursive function, however I'm not so sure how to do this in a SQL procedure (yes it has to be a SQL procedure).The tables contain about a hundread categories of any kind and there are over 100 000 products.
Any ideas?

Bill Karwin made some nice slides about hierachical data, and the current Adjacency Model certainly as pros, but it's not very suited for this (getting a whole subtree).
For my Adjacency tables, I solve it by storing / caching the path (possibly in a script, or in a 'before update trigger'), on change of parent_id id, a new path-string is created. Your current table would look like this:
id | parent_id | path | name | count
1 NULL 1 Literature 6020
2 1 1:2 Interesting books 1000
3 1 1:3 Horrible books 5000
4 1 1:4 Books to burn 20
5 NULL 5 Motorized vehicles 1000
6 5 5:6 Cars 999
7 5 5:7 Motorbikes 1
(choose any delimiter not found in the id you like)
So, now to get all products from a category + subcategories:
SELECT p.*
FROM categories c_main
JOIN categories c_subs
ON c_subs.id = c_main.id
OR c_subs.path LIKE CONCAT(c_main,':%')
JOIN products p
ON p.category_id = c_subs.id
WHERE c_main.id = <id>

Take a look at this article on managing heirachical trees in MySQL.
It explains the disadvantages to your current method and some more optimal solutions.
See especially the section towards the ended headed 'Aggregate Functions in a Nested Set'.

There's a whole chapter in "SQL Antipatterns Avoiding the Pitfalls of Database Programming" by Bill Karwin about managing hierachical data in SQL.

As you havent accepted an answer yet i thought i'd post my method for handling trees in mysql and php. (single db call to non recursive sproc)
Full script here : http://pastie.org/1252426 or see below...
Hope this helps :)
PHP
<?php
$conn = new mysqli("localhost", "foo_dbo", "pass", "foo_db", 3306);
$result = $conn->query(sprintf("call product_hier(%d)", 3));
echo "<table border='1'>
<tr><th>prod_id</th><th>prod_name</th><th>parent_prod_id</th>
<th>parent_prod_name</th><th>depth</th></tr>";
while($row = $result->fetch_assoc()){
echo sprintf("<tr><td>%s</td><td>%s</td><td>%s</td><td>%s</td><td>%s</td></tr>",
$row["prod_id"],$row["prod_name"],$row["parent_prod_id"],
$row["parent_prod_name"],$row["depth"]);
}
echo "</table>";
$result->close();
$conn->close();
?>
SQL
drop table if exists product;
create table product
(
prod_id smallint unsigned not null auto_increment primary key,
name varchar(255) not null,
parent_id smallint unsigned null,
key (parent_id)
)engine = innodb;
insert into product (name, parent_id) values
('Products',null),
('Systems & Bundles',1),
('Components',1),
('Processors',3),
('Motherboards',3),
('AMD',5),
('Intel',5),
('Intel LGA1366',7);
delimiter ;
drop procedure if exists product_hier;
delimiter #
create procedure product_hier
(
in p_prod_id smallint unsigned
)
begin
declare v_done tinyint unsigned default 0;
declare v_depth smallint unsigned default 0;
create temporary table hier(
parent_id smallint unsigned,
prod_id smallint unsigned,
depth smallint unsigned default 0
)engine = memory;
insert into hier select parent_id, prod_id, v_depth from product where prod_id = p_prod_id;
/* http://dev.mysql.com/doc/refman/5.0/en/temporary-table-problems.html */
create temporary table tmp engine=memory select * from hier;
while not v_done do
if exists( select 1 from product p inner join hier on p.parent_id = hier.prod_id and hier.depth = v_depth) then
insert into hier
select p.parent_id, p.prod_id, v_depth + 1 from product p
inner join tmp on p.parent_id = tmp.prod_id and tmp.depth = v_depth;
set v_depth = v_depth + 1;
truncate table tmp;
insert into tmp select * from hier where depth = v_depth;
else
set v_done = 1;
end if;
end while;
select
p.prod_id,
p.name as prod_name,
b.prod_id as parent_prod_id,
b.name as parent_prod_name,
hier.depth
from
hier
inner join product p on hier.prod_id = p.prod_id
inner join product b on hier.parent_id = b.prod_id
order by
hier.depth, hier.prod_id;
drop temporary table if exists hier;
drop temporary table if exists tmp;
end #
delimiter ;
call product_hier(3);
call product_hier(5);

What you want is a common table expression. Unfortunately it looks like mysql doesn't support them.
Instead you will probably need to use a loop to keep selecting deeper trees.
I'll try whip up an example.
To clarify, you're looking to be able to call the procedure with an input of say '1' and get back all the sub categories and subsub categories (and so on) with 1 as an eventual root?
like
id parent
1 null
2 1
3 1
4 2
?
Edited:
This is what I came up with, it seems to work.
Unfortunately I don't have mysql, so I had to use sql server. I tried to check everythign to make sure it will work with mysql but there may still be issues.
declare #input int
set #input = 1
--not needed, but informative
declare #depth int
set #depth = 0
--for breaking out of the loop
declare #break int
set #break = 0
--my table '[recursive]' is pretty simple, the results table matches it
declare #results table
(
id int,
parent int,
depth int
)
--Seed the results table with the root node
insert into #results
select id, parent, #depth from [recursive]
where ID = #input
--Loop through, adding notes as we go
set #break = 1
while (#break > 0)
begin
set #depth=#depth+1 --Increase the depth counter each loop
--This checks to see how many rows we are about to add to the table.
--If we don't add any rows, we can stop looping
select #break = count(id) from [recursive]
where parent in
(
select id from #results
)
and id not in --Don't add rows that are already in the results
(
select id from #results
)
--Here we add the rows to the results table
insert into #results
select id, parent, #depth from [recursive]
where parent in
(
select id from #results
)
and id not in --Don't add rows that are already in the results
(
select id from #results
)
end
--Select the results and return
select * from #results

Try to get rid of the hierarchy that is implemented that way. Recursion in stored procedures aren't nice, and for example, on MS SQL they fail after 64th level.
Also, to get for example everything from some category and it's subcategories, you will have to recursively go all the way down, which is impractical for SQL - nevertheless to say slow.
Instead, use this; create category_path field, and make it look like:
category_path name
1/ literature
1/2/ Interesting books
1/3/ Horrible books
1/4/ Books to burn
5/ Motorized vehicles
5/6/ Cars
5/7/ Motorbikes
By using that method, you will be able to SELECT categories and subcategories very fast. Updates will be slow, but I guess that they CAN be slow. Also, you can keep your old child-parent relationship fields, to help you maintain your tree structure.
For example, getting all cars, without any recursion, will be:
SELECT * FROM ttt WHERE category_path LIKE '5/%'

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Optimizing tree branch data aggregation in SQL Server 2008 (recursion) - sql-server-2008

Related

Table is specified twice, both as a target for 'UPDATE' and as a separate source for data in mysql

SQL - Column in field list is ambiguous

indexes don't affect time execution in ms sql 2014 VS mysql (mariaDB 10)

Mysql, how to change the following query to fech each row of a table?

Handling tree in a MySQL procedure

Categories

Resources