Recursive query with optional depth limit with MySQL 5.6 - mysql
I have two table schemas (MySQL 5.6 so no CTE), roughly looking like this:
CREATE TABLE nodes (
node_id INT PRIMARY KEY,
name VARCHAR(10)
);
CREATE TABLE edges (
edge_id INT PRIMARY KEY,
source INT,
target INT,
FOREIGN KEY (source) REFERENCES nodes(node_id),
FOREIGN KEY (target) REFERENCES nodes(node_id)
);
In our design, a logical edge between two nodes (logically n1 -> n2) is actually represented as (n1 -> proxy node -> n2) in the db. The reason we use two edges and a proxy node for a logical edge is so that we can store properties on the edge. Therefore, when a client queries for two nodes connected by an edge, the query is translated to query three connected nodes instead.
I have written a query to get a path with a fixed length. For example, "give me all the paths that start with a node with some properties, and end with a node with some properties, with exactly 5 edges on the path." This is done without using recursion on the SQL side; I just generate a long query programmatically with the specified fixed length.
The challenge is, we want to support querying of a variable-length path. For example, "give me all the paths that start with a node with some properties, and end with a node with some properties, with no fewer than 3 edges and no more than 10 edges on the path." Is this feasible without (or even with) CTE?
EDIT:
Some sample data:
-- Logical nodes are 1, 3, 5, 7, 9, 11. The rest are proxy nodes.
INSERT INTO nodes VALUES
(1, 'foo'),
(2, '_proxy_'),
(3, 'foo'),
(4, '_proxy_'),
(5, 'bar'),
(6, '_proxy_'),
(7, 'bar'),
(8, '_proxy_'),
(9, 'bar'),
(10, '_proxy_'),
(11, 'bar');
-- Connects 1 -> 2 -> ... -> 11.
INSERT INTO edges VALUES
(1, 1, 2),
(2, 2, 3),
(3, 3, 4),
(4, 4, 5),
(5, 5, 6),
(6, 6, 7),
(7, 7, 8),
(8, 8, 9),
(9, 9, 10),
(10, 10, 11);
The query can be, "select the ID and names of all the nodes on a path such that the path starts with a node named 'foo' and ends with a node named 'bar', with at least 2 nodes and at most 4 nodes on the path." Such paths include 1 -> 3 -> 5, 1 -> 3 -> 5 -> 7, 3 -> 5, 3 -> 5 -> 7, and 3 -> 5 -> 7 -> 9. So the result set should include the IDs and names of nodes 1, 3, 5, 7, 9.
The following query returns all paths of interest in comma separated strings.
with recursive rcte as (
select e.source, e.target, 1 as depth, concat(e.source) as path
from nodes n
join edges e on e.source = n.node_id
where n.name = 'foo' -- start node name
union all
select e.source, e.target, r.depth + 1 as depth, concat_ws(',', r.path, e.source)
from rcte r
join edges p on p.source = r.target -- p for proxy
join edges e on e.source = p.target
where r.depth < 4 -- max path nodes
)
select r.path
from rcte r
join nodes n on n.node_id = r.source
where r.depth >= 2 -- min path nodes
and n.name = 'bar' -- end node name
The result looks like this:
| path |
| ------- |
| 3,5 |
| 1,3,5 |
| 3,5,7 |
| 1,3,5,7 |
| 3,5,7,9 |
View on DB Fiddle
You can now parse the strings in application code and merge/union the arrays. If you only want the contained node ids, you can also change the outer query to:
select distinct r2.source
from rcte r
join nodes n on n.node_id = r.source
join rcte r2 on find_in_set(r2.source, r.path)
where r.depth >= 2 -- min path nodes
and n.name = 'bar' -- end node name
Result:
| source |
| ------ |
| 1 |
| 3 |
| 5 |
| 7 |
| 9 |
View on DB Fiddle
Note that a JOIN on FIND_IN_SET() might be slow, if rcte contains too many rows. I would rather do this step in application code, which should be quite simple in a procedural language.
MySQL 5.6 solution
Prior to MySQL 8.0 and MariaDB 10.2 there was no way for recursions. Farther there are many other limitations, which make a workaround difficult. For example:
No dynamic queries in stored functions
No way to use a temporary table twice in a single statement
No TEXT type in memmory engine
However - an RCTE can be emulated in a stored procedure moving rows between two (temporary) tables. The following procedure does that:
delimiter //
create procedure get_path(
in source_name text,
in target_name text,
in min_depth int,
in max_depth int
)
begin
create temporary table tmp_sources (id int, depth int, path text) engine=innodb;
create temporary table tmp_targets like tmp_sources;
insert into tmp_sources (id, depth, path)
select n.node_id, 1, n.node_id
from nodes n
where n.name = source_name;
set #depth = 1;
while #depth < max_depth do
set #depth = #depth+1;
insert into tmp_targets(id, depth, path)
select e.target, #depth, concat_ws(',', t.path, e.target)
from tmp_sources t
join edges p on p.source = t.id
join edges e on e.source = p.target
where t.depth = #depth - 1;
insert into tmp_sources (id, depth, path)
select id, depth, path
from tmp_targets;
truncate tmp_targets;
end while;
select t.path
from tmp_sources t
join nodes n on n.node_id = t.id
where n.name = target_name
and t.depth >= min_depth;
end //
delimiter ;
Use it as:
call get_path('foo', 'bar', 2, 4)
Result:
| path |
| ------- |
| 3,5 |
| 1,3,5 |
| 3,5,7 |
| 1,3,5,7 |
| 3,5,7,9 |
View on DB Fiddle
This is far from being optimal. If the result has many or long paths, you might need to define some indexes on the temprary tables. Also I don't like the idea of creating (temporary) tables in stroed procedures. See it as "proof of concept". Use it on your own risk.
I've solved this sort of problem with a transitive closure table. This enumerates every direct and indirect path through your nodes. The edges you currently have are paths of length 1. But you also need paths of length 0 (i.e., a node has a path to itself), and then every path from one source node to an eventual target node, for paths with length greater than 1.
create table closure (
source int,
target int,
length int,
is_direct bool,
primary key (source, target)
);
insert into closure values
(1, 1, 0, false), (1, 2, 1, true), (1, 3, 2, false), (1, 4, 3, false), (1, 5, 4, false), (1, 6, 5, false), (1, 7, 6, false), (1, 8, 7, false), (1, 9, 8, false), (1, 10, 9, false), (1, 11, 10, false),
(2, 2, 0, false), (2, 3, 1, true), (2, 4, 2, false), (2, 5, 3, false), (2, 6, 4, false), (2, 7, 5, false), (2, 8, 6, false), (2, 9, 7, false), (2, 10, 8, false), (2, 11, 9, false),
(3, 3, 0, false), (3, 4, 1, true), (3, 5, 2, false), (3, 6, 3, false), (3, 7, 4, false), (3, 8, 5, false), (3, 9, 6, false), (3, 10, 7, false), (3, 11, 8, false),
(4, 4, 0, false), (4, 5, 1, true), (4, 6, 2, false), (4, 7, 3, false), (4, 8, 4, false), (4, 9, 5, false), (4, 10, 6, false), (4, 11, 7, false),
(5, 5, 0, false), (5, 6, 1, true), (5, 7, 2, false), (5, 8, 3, false), (5, 9, 4, false), (5, 10, 5, false), (5, 11, 6, false),
(6, 6, 0, false), (6, 7, 1, true), (6, 8, 2, false), (6, 9, 3, false), (6, 10, 4, false), (6, 11, 5, false),
(7, 7, 0, false), (7, 8, 1, true), (7, 9, 2, false), (7, 10, 3, false), (7, 11, 4, false),
(8, 8, 0, false), (8, 9, 1, true), (8, 10, 2, false), (8, 11, 3, false),
(9, 9, 0, false), (9, 10, 1, true), (9, 11, 2, true),
(10, 10, 0, false), (10, 11, 1, true),
(11, 11, 0, false);
Now we can write your query:
select the ID and names of all the nodes on a path such that the path starts with a node named 'foo' and ends with a node named 'bar', with at least 2 nodes and at most 4 nodes on the path.
I translate this into paths of length 4,6,8 because you have a proxy node in between each, so it really takes two hops to go between nodes.
select source.node_id as source_node, target.node_id as target_node, c.length
from nodes as source
join closure as c on source.node_id = c.source
join nodes as target on c.target = target.node_id
where source.name='foo' and target.name = 'bar' and c.length in (4,6,8)
Here's the result, which in fact also includes node 11:
+-------------+-------------+--------+
| source_node | target_node | length |
+-------------+-------------+--------+
| 1 | 5 | 4 |
| 1 | 7 | 6 |
| 1 | 9 | 8 |
| 3 | 7 | 4 |
| 3 | 9 | 6 |
| 3 | 11 | 8 |
+-------------+-------------+--------+
Re comment from Paul Spiegel:
Once you have the endpoints of the path, you can query the closure for all paths that start at the source, and end at a node that also has a path to the target.
select source.node_id as source_node, target.node_id as target_node,
group_concat(i1.target order by i1.target) as interim_nodes
from nodes as source
join closure as c on source.node_id = c.source
join nodes as target on c.target = target.node_id
join closure as i1 on source.node_id = i1.source
join closure as i2 on target.node_id = i2.target and i1.target = i2.source
where source.name='foo' and target.name = 'bar' and c.length in (4,6,8)
group by source.node_id, target.node_id
+-------------+-------------+---------------------+
| source_node | target_node | interim_nodes |
+-------------+-------------+---------------------+
| 1 | 5 | 1,2,3,4,5 |
| 1 | 7 | 1,2,3,4,5,6,7 |
| 1 | 9 | 1,2,3,4,5,6,7,8,9 |
| 3 | 7 | 3,4,5,6,7 |
| 3 | 9 | 3,4,5,6,7,8,9 |
| 3 | 11 | 3,4,5,6,7,8,9,10,11 |
+-------------+-------------+---------------------+
Related
Make a tuple of arbitrary size functionally in Julia
An ordinary way to make a tuple in Julia is like this: n = 5 t2 = (n,n) # t2 = (5,5) t3 = (n,n,n)# t3 = (5,5,5) I want to make a tuple of arbitrary size functionally. n = 5 someFunction(n,size) = ??? t10 = someFunction(n,10) # t10 = (5,5,5,5,5,5,5,5,5,5) How can I realize this? Any information would be appreciated.
Maybe what you are looking for is ntuple ? julia> ntuple(_ -> 5, 10) (5, 5, 5, 5, 5, 5, 5, 5, 5, 5) Note that, you can also use tuple or Tuple: julia> tuple((5 for _ in 1:10)...) (5, 5, 5, 5, 5, 5, 5, 5, 5, 5) julia> Tuple(5 for _ in 1:10) (5, 5, 5, 5, 5, 5, 5, 5, 5, 5)
Finding Connection Pairs
Suppose we have a table in mySQL database where fname has a connection to another fname(BB_Connection_name), we would like have a query to find the pair(s) of friends who find connection among themselves. E.g Sidharth and Asim both have each others BBid and BB_Connection_ID I have looked for similar case of father, son and grandson question but in that not each father has a son and thus inner joining them makes things easier for solving. I tried using that but didn't work. Here i need to check BB_Connection_ID for every fname(A) and then corresponding fname has A's BBid as his BB_Connection_ID or not. The pairs which would be chosen, should be like Sidharth<->Asim We need to find the pairs who have their connection ID to each other. ========================================================================== Code for recreation of the table: ----------------------------------------------------------------------------- create table world.bigbb( BBid int not null auto_increment, fname varchar(20) NOT NULL, lname varchar(30), BBdays int not null, No_of_Nom int, BB_rank int not null, BB_Task varchar(10), BB_Connection_ID int, BB_Connection_name varchar(10), primary key (BBid) ); insert into world.bigbb (fname, lname, BBdays, No_of_Nom, BB_rank, BB_Task, BB_Connection_ID, BB_Connection_name) values ('Sidharth', 'Shukla', 40, 4, 2, 'Kitchen', 11, 'Asim'), ('Arhaan', 'Khan', 7, 1, 9, 'Kitchen', 16, 'Rashmi'), ('Vikas', 'Bhau', 7, 1, 8, 'Bedroom', 11, 'Asim'), ('Khesari', 'Bihari', 7, 1, 12, 'Kitchen', 9, 'Paras'), ('Tehseem', 'Poonawala', 7, 1, 11, 'Washroom', 12, 'Khesari'), ('Shehnaaz', 'Gill', 40, 4, 4, 'Washroom', 9, 'Paras'), ('Himanshi', 'Khurana', 7, 0, 7, 'Bedroom', 8, 'Shefali'), ('Shefali', 'Zariwala', 7, 1, 10, 'Bedroom', 1, 'Sidharth'), ('Paras', 'Chabra', 40, 3, 1, 'Bathroom', 10, 'Mahira'), ('Mahira', 'Sharma', 40, 4, 5, 'Kitchen', 9, 'Paras'), ('Asim', 'Khan', 40, 3, 3, 'Bathroom', 1, 'Sidharth'), ('Arti', 'Singh', 40, 5, 6, 'Captain', 1, 'Sidharth'), ('Sidharth', 'Dey', 35, 6, 16, 'None', 14, 'Shefali'), ('Shefali', 'Bagga', 38, 5, 15, 'None', 13, 'Sidharth'), ('Abu', 'Fifi', 22, 5, 17, 'None', 11, 'Asim'), ('Rashmi', 'Desai', 38, 5, 13, 'None', 17, 'Debolina'), ('Debolina', 'Bhattacharjee', 38, 5, 14, 'None', 16, 'Rashmi');
One solution would be to self-join the table: select b1.fname name1, b2.fname name2 from bigbb b1 inner join bigbb b2 on b1.BB_Connection_ID = b2.BBid and b2.BB_Connection_ID = b1.BBid and b1.BBid < b2.BBid This will give you one record for each pair, with the record having the smallest BBid in the first column. This demo on DB Fiddle with your sample data returns: name1 | name2 :------- | :------- Sidharth | Asim Paras | Mahira Sidharth | Shefali Rashmi | Debolina
single table reccuring relatives
Can't name it less confusing, sorry... Imagine the DB table with 3 columns: object_id - some entity, relation_key - some property of the object, bundle_id - we must generalize different objects with this id. Table has unique key for [object_id, relation_key]: single object can't have duplicated relation_key, but different objects can have equal relation_key Some oxygen understanding with the picture: Plenty objects can have deep relations by relation_key, all this objects will be related with bundle_id How can I update bundle_id column with correct values using just single query? I can write procedure but this way is unsuitable for me. I look for statement like: "UPDATE example [join example ON ...] SET bundle_id = ... WHERE ..." there is "before" schema for mysql: CREATE TABLE `example` ( `bundle_id` INT(11) DEFAULT NULL, `object_id` INT(11) NOT NULL, `relation_key` INT(11) NOT NULL, PRIMARY KEY (`object_id`,`relation_key`) ); INSERT INTO `example`(`object_id`, `relation_key`) VALUES (1, 4), (1, 5), (1, 6), (2, 6), (2, 7), (2, 8), (3, 4), (3, 9), (3, 10), (4, 11), (4, 12), (4, 13), (5, 14), (5, 15), (5, 16), (6, 17), (6, 11), (6, 18); Here is the example "before": fiddle example (sqlfiddle stuck for this moment) And "after" will look like like if you do the queries : UPDATE `example` SET `bundle_id` = 1 WHERE `object_id` IN (1, 2, 3); UPDATE `example` SET `bundle_id` = 2 WHERE `object_id` IN (4, 6); UPDATE `example` SET `bundle_id` = 3 WHERE `object_id` IN (5); object1 related to object2 by key=6, object3 related TO object1 by key=4, so ... objs 1, 2, 3 are related together. here must be first bundle_id=1. there is no other keys linking another objects to 1, 2, 3 object_id=4 related to object_id=6 by key=11 so ... obj [4, 6] are related together. here must be second bundle_id=2, there is no other keys linking another objects to 4, 6 object_id=5 has no relations to other objects all object's key belong to itself. here must be second bundle_id=3, there is no other keys linking another objects to 5
SQL: value higher than percentage of population of values
I wish to calculate the value which is higher than a percentage of the population of values, this per group. Suppose I have: CREATE TABLE project ( id int, event int, val int ); INSERT INTO project(id,event,val) VALUES (1, 11, 43), (1, 12, 19), (1, 13, 19), (1, 14, 53), (1, 15, 45), (1, 16, 35), (2, 21, 22), (2, 22, 30), (2, 23, 25), (2, 24, 28); I now want to calculate for each id what is the val that will be for example higher than 5%, or 30% of the val for that id. For example, for id=1, we have the following values: 43, 19, 19, 53, 45, 35. So the contingency table would look like this: 19 35 43 45 53 2 1 1 1 1 and the val=20 (higher than 19) would be chosen to be higher than 5% (actuall 2 out of 6) of the rows. The contengency table for id 2 is: 22 25 28 30 1 1 1 1 My expected out is: id val_5p_coverage val_50p_coverage 1 20 36 2 23 26 val_5p_coverage is the value val needed to be above at least 5% of val in the id. val_50p_coverage is the value val needed to be above at least 50% of val in the id. How can I calculate this with SQL ?
I managed to do it in HiveQL (for Hadoop) as follows: create table prep as select *, CUME_DIST() OVER(PARTITION BY id ORDER BY val ASC) as proportion_val_equal_or_lower from project SELECT id, MIN(IF(proportion_val_equal_or_lower>=0.05, val, NULL)) AS val_5p_coverage, MIN(IF(proportion_val_equal_or_lower>=0.50, val, NULL)) AS val_50p_coverage FROM prep GROUP BY id Although this is not MySQL nor SQL per se, it might help to do it in MySQL or SQL.
combine multiple rows from the same table (days of the week)
I have a table which stores yes/no values for each hour of the day for a particular user/filter/type. Each day is its own row. So there will always be 7 rows for any given user/filter/type combination. What I am trying to accomplish is one result for each user/filter/type combination that contains all hours of each day of the week. I think the approach I need here is self joining with aliases and groups, but everything I have tried fails. I have setup a basic fiddle at fiddle I also have the ability to change the db schema for this as well if there is an easier and/or preferred method to handle this on the db side (my gut says there is). Perhaps a table for each day of the week linked by filter_id? INSERT INTO filters (`filter_id`, `user_id`, `filter`, `type`, `day`, `12a`, `1a`... and so on for each hour) VALUES (1, 1, 'filter1', 1, 1, 1, 1), (2, 1, 'filter1', 1, 2, 1, 1), (3, 1, 'filter1', 1, 3, 1, 1), (4, 1, 'filter1', 1, 4, 1, 1), (5, 1, 'filter1', 1, 5, 1, 1), (6, 1, 'filter1', 1, 6, 1, 1), (7, 1, 'filter1', 1, 7, 1, 1), (8, 1, 'filter2', 1, 1, 0, 0), (9, 1, 'filter2', 1, 2, 0, 0), (10, 1, 'filter2', 1, 3, 0, 0), (11, 1, 'filter2', 1, 4, 0, 0), (12, 1, 'filter2', 1, 5, 0, 0), (13, 1, 'filter2', 1, 6, 0, 0), (14, 1, 'filter2', 1, 7, 0, 0), (15, 1, 'filter3', 1, 1, 0, 0), (16, 1, 'filter3', 1, 2, 0, 0), (17, 1, 'filter3', 1, 3, 0, 0), (18, 1, 'filter3', 1, 4, 0, 0), (19, 1, 'filter3', 1, 5, 0, 0), (20, 1, 'filter3', 1, 6, 0, 0), (21, 1, 'filter3', 1, 7, 0, 0) ; EDIT : I made some progress on this and it is showing all day hours for the filter in one result...however... I it does not work when a user has more than one filter. I can't seem to get the grouping correct and/or something else so results only show unique user/type/filter combinations... at the moment it only shows one filter result for each user. This is only joining monday, tuesday, wednesday as well... there must be an easier way to do this. Like I said I am totally open up to changing the schema of the db for this as well, but not sure what the best approach would be other than this. I certainly cannot list all the hours for the entire week in one table (that would be 168 columns for hours plus an additional for for 172 in each row). $stmt = $db->prepare(" SELECT users.user_id, users.username, c.computer_name, filters.user_id, filters.filter, filters.type, monday.12a as m12a, monday.1a as m1a, monday.2a as m2a, monday.3a as m3a, monday.4a as m4a, monday.5a as m5a, monday.6a as m6a, monday.7a as m7a, monday.8a as m8a, monday.9a as m9a, monday.10a as m10a, monday.11a as m11a, monday.12p as m12p, monday.1p as m1p, monday.2p as m2p, monday.3p as m3p, monday.4p as m4p, monday.5p as m5p, monday.6p as m6p, monday.7p as m8p, monday.9p as m9p, monday.10p as m10p, monday.11p as m11p, tuesday.12a as t12a, tuesday.1a as t1a, tuesday.2a as t2a, tuesday.3a as t3a, tuesday.4a as t4a, tuesday.5a as t5a, tuesday.6a as t6a, tuesday.7a as t7a, tuesday.8a as t8a, tuesday.9a as t9a, tuesday.10a as t10a, tuesday.11a as t11a, tuesday.12p as t12p, tuesday.1p as t1p, tuesday.2p as t2p, tuesday.3p as t3p, tuesday.4p as t4p, tuesday.5p as t5p, tuesday.6p as t6p, tuesday.7p as t8p, tuesday.9p as t9p, tuesday.10p as t10p, tuesday.11p as t11p, wednesday.12a as w12a, wednesday.1a as w1a, wednesday.2a as w2a, wednesday.3a as w3a, wednesday.4a as w4a, wednesday.5a as w5a, wednesday.6a as w6a, wednesday.7a as w7a, wednesday.8a as w8a, wednesday.9a as w9a, wednesday.10a as w10a, wednesday.11a as w11a, wednesday.12p as w12p, wednesday.1p as w1p, wednesday.2p as w2p, wednesday.3p as w3p, wednesday.4p as w4p, wednesday.5p as w5p, wednesday.6p as w6p, wednesday.7p as w8p, wednesday.9p as w9p, wednesday.10p as w10p, wednesday.11p as w11p FROM ( SELECT account_id, computer_id, computer_name FROM computers WHERE account_id = 1 ORDER BY computer_id ASC LIMIT 0, 5 ) as c LEFT JOIN users on users.computer_id = c.computer_id LEFT JOIN filters on filters.user_id = users.user_id LEFT JOIN filters as monday on monday.user_id = filters.user_id and monday.filter = filters.filter and monday.day = 1 LEFT JOIN filters as tuesday on tuesday.user_id = filters.user_id and tuesday.filter = filters.filter and tuesday.day = 2 LEFT JOIN filters as wednesday on wednesday.user_id = filters.user_id and wednesday.filter = filters.filter and wednesday.day = 3 WHERE filters.type = 1 GROUP BY users.user_id ");
The INNER JOIN keyword selects all rows from both tables as long as there is a match between the columns in both tables. SELECT column_name(s) FROM table1 INNER JOIN table2 ON table1.column_name=table2.column_name; Here is the official mysql page. http://dev.mysql.com/doc/refman/5.0/en/join.html