I have two table schemas (MySQL 5.6 so no CTE), roughly looking like this:
CREATE TABLE nodes (
node_id INT PRIMARY KEY,
name VARCHAR(10)
);
CREATE TABLE edges (
edge_id INT PRIMARY KEY,
source INT,
target INT,
FOREIGN KEY (source) REFERENCES nodes(node_id),
FOREIGN KEY (target) REFERENCES nodes(node_id)
);
In our design, a logical edge between two nodes (logically n1 -> n2) is actually represented as (n1 -> proxy node -> n2) in the db. The reason we use two edges and a proxy node for a logical edge is so that we can store properties on the edge. Therefore, when a client queries for two nodes connected by an edge, the query is translated to query three connected nodes instead.
I have written a query to get a path with a fixed length. For example, "give me all the paths that start with a node with some properties, and end with a node with some properties, with exactly 5 edges on the path." This is done without using recursion on the SQL side; I just generate a long query programmatically with the specified fixed length.
The challenge is, we want to support querying of a variable-length path. For example, "give me all the paths that start with a node with some properties, and end with a node with some properties, with no fewer than 3 edges and no more than 10 edges on the path." Is this feasible without (or even with) CTE?
EDIT:
Some sample data:
-- Logical nodes are 1, 3, 5, 7, 9, 11. The rest are proxy nodes.
INSERT INTO nodes VALUES
(1, 'foo'),
(2, '_proxy_'),
(3, 'foo'),
(4, '_proxy_'),
(5, 'bar'),
(6, '_proxy_'),
(7, 'bar'),
(8, '_proxy_'),
(9, 'bar'),
(10, '_proxy_'),
(11, 'bar');
-- Connects 1 -> 2 -> ... -> 11.
INSERT INTO edges VALUES
(1, 1, 2),
(2, 2, 3),
(3, 3, 4),
(4, 4, 5),
(5, 5, 6),
(6, 6, 7),
(7, 7, 8),
(8, 8, 9),
(9, 9, 10),
(10, 10, 11);
The query can be, "select the ID and names of all the nodes on a path such that the path starts with a node named 'foo' and ends with a node named 'bar', with at least 2 nodes and at most 4 nodes on the path." Such paths include 1 -> 3 -> 5, 1 -> 3 -> 5 -> 7, 3 -> 5, 3 -> 5 -> 7, and 3 -> 5 -> 7 -> 9. So the result set should include the IDs and names of nodes 1, 3, 5, 7, 9.
The following query returns all paths of interest in comma separated strings.
with recursive rcte as (
select e.source, e.target, 1 as depth, concat(e.source) as path
from nodes n
join edges e on e.source = n.node_id
where n.name = 'foo' -- start node name
union all
select e.source, e.target, r.depth + 1 as depth, concat_ws(',', r.path, e.source)
from rcte r
join edges p on p.source = r.target -- p for proxy
join edges e on e.source = p.target
where r.depth < 4 -- max path nodes
)
select r.path
from rcte r
join nodes n on n.node_id = r.source
where r.depth >= 2 -- min path nodes
and n.name = 'bar' -- end node name
The result looks like this:
| path |
| ------- |
| 3,5 |
| 1,3,5 |
| 3,5,7 |
| 1,3,5,7 |
| 3,5,7,9 |
View on DB Fiddle
You can now parse the strings in application code and merge/union the arrays. If you only want the contained node ids, you can also change the outer query to:
select distinct r2.source
from rcte r
join nodes n on n.node_id = r.source
join rcte r2 on find_in_set(r2.source, r.path)
where r.depth >= 2 -- min path nodes
and n.name = 'bar' -- end node name
Result:
| source |
| ------ |
| 1 |
| 3 |
| 5 |
| 7 |
| 9 |
View on DB Fiddle
Note that a JOIN on FIND_IN_SET() might be slow, if rcte contains too many rows. I would rather do this step in application code, which should be quite simple in a procedural language.
MySQL 5.6 solution
Prior to MySQL 8.0 and MariaDB 10.2 there was no way for recursions. Farther there are many other limitations, which make a workaround difficult. For example:
No dynamic queries in stored functions
No way to use a temporary table twice in a single statement
No TEXT type in memmory engine
However - an RCTE can be emulated in a stored procedure moving rows between two (temporary) tables. The following procedure does that:
delimiter //
create procedure get_path(
in source_name text,
in target_name text,
in min_depth int,
in max_depth int
)
begin
create temporary table tmp_sources (id int, depth int, path text) engine=innodb;
create temporary table tmp_targets like tmp_sources;
insert into tmp_sources (id, depth, path)
select n.node_id, 1, n.node_id
from nodes n
where n.name = source_name;
set #depth = 1;
while #depth < max_depth do
set #depth = #depth+1;
insert into tmp_targets(id, depth, path)
select e.target, #depth, concat_ws(',', t.path, e.target)
from tmp_sources t
join edges p on p.source = t.id
join edges e on e.source = p.target
where t.depth = #depth - 1;
insert into tmp_sources (id, depth, path)
select id, depth, path
from tmp_targets;
truncate tmp_targets;
end while;
select t.path
from tmp_sources t
join nodes n on n.node_id = t.id
where n.name = target_name
and t.depth >= min_depth;
end //
delimiter ;
Use it as:
call get_path('foo', 'bar', 2, 4)
Result:
| path |
| ------- |
| 3,5 |
| 1,3,5 |
| 3,5,7 |
| 1,3,5,7 |
| 3,5,7,9 |
View on DB Fiddle
This is far from being optimal. If the result has many or long paths, you might need to define some indexes on the temprary tables. Also I don't like the idea of creating (temporary) tables in stroed procedures. See it as "proof of concept". Use it on your own risk.
I've solved this sort of problem with a transitive closure table. This enumerates every direct and indirect path through your nodes. The edges you currently have are paths of length 1. But you also need paths of length 0 (i.e., a node has a path to itself), and then every path from one source node to an eventual target node, for paths with length greater than 1.
create table closure (
source int,
target int,
length int,
is_direct bool,
primary key (source, target)
);
insert into closure values
(1, 1, 0, false), (1, 2, 1, true), (1, 3, 2, false), (1, 4, 3, false), (1, 5, 4, false), (1, 6, 5, false), (1, 7, 6, false), (1, 8, 7, false), (1, 9, 8, false), (1, 10, 9, false), (1, 11, 10, false),
(2, 2, 0, false), (2, 3, 1, true), (2, 4, 2, false), (2, 5, 3, false), (2, 6, 4, false), (2, 7, 5, false), (2, 8, 6, false), (2, 9, 7, false), (2, 10, 8, false), (2, 11, 9, false),
(3, 3, 0, false), (3, 4, 1, true), (3, 5, 2, false), (3, 6, 3, false), (3, 7, 4, false), (3, 8, 5, false), (3, 9, 6, false), (3, 10, 7, false), (3, 11, 8, false),
(4, 4, 0, false), (4, 5, 1, true), (4, 6, 2, false), (4, 7, 3, false), (4, 8, 4, false), (4, 9, 5, false), (4, 10, 6, false), (4, 11, 7, false),
(5, 5, 0, false), (5, 6, 1, true), (5, 7, 2, false), (5, 8, 3, false), (5, 9, 4, false), (5, 10, 5, false), (5, 11, 6, false),
(6, 6, 0, false), (6, 7, 1, true), (6, 8, 2, false), (6, 9, 3, false), (6, 10, 4, false), (6, 11, 5, false),
(7, 7, 0, false), (7, 8, 1, true), (7, 9, 2, false), (7, 10, 3, false), (7, 11, 4, false),
(8, 8, 0, false), (8, 9, 1, true), (8, 10, 2, false), (8, 11, 3, false),
(9, 9, 0, false), (9, 10, 1, true), (9, 11, 2, true),
(10, 10, 0, false), (10, 11, 1, true),
(11, 11, 0, false);
Now we can write your query:
select the ID and names of all the nodes on a path such that the path starts with a node named 'foo' and ends with a node named 'bar', with at least 2 nodes and at most 4 nodes on the path.
I translate this into paths of length 4,6,8 because you have a proxy node in between each, so it really takes two hops to go between nodes.
select source.node_id as source_node, target.node_id as target_node, c.length
from nodes as source
join closure as c on source.node_id = c.source
join nodes as target on c.target = target.node_id
where source.name='foo' and target.name = 'bar' and c.length in (4,6,8)
Here's the result, which in fact also includes node 11:
+-------------+-------------+--------+
| source_node | target_node | length |
+-------------+-------------+--------+
| 1 | 5 | 4 |
| 1 | 7 | 6 |
| 1 | 9 | 8 |
| 3 | 7 | 4 |
| 3 | 9 | 6 |
| 3 | 11 | 8 |
+-------------+-------------+--------+
Re comment from Paul Spiegel:
Once you have the endpoints of the path, you can query the closure for all paths that start at the source, and end at a node that also has a path to the target.
select source.node_id as source_node, target.node_id as target_node,
group_concat(i1.target order by i1.target) as interim_nodes
from nodes as source
join closure as c on source.node_id = c.source
join nodes as target on c.target = target.node_id
join closure as i1 on source.node_id = i1.source
join closure as i2 on target.node_id = i2.target and i1.target = i2.source
where source.name='foo' and target.name = 'bar' and c.length in (4,6,8)
group by source.node_id, target.node_id
+-------------+-------------+---------------------+
| source_node | target_node | interim_nodes |
+-------------+-------------+---------------------+
| 1 | 5 | 1,2,3,4,5 |
| 1 | 7 | 1,2,3,4,5,6,7 |
| 1 | 9 | 1,2,3,4,5,6,7,8,9 |
| 3 | 7 | 3,4,5,6,7 |
| 3 | 9 | 3,4,5,6,7,8,9 |
| 3 | 11 | 3,4,5,6,7,8,9,10,11 |
+-------------+-------------+---------------------+
I have a list of tuples looking like this:
import datetime as dt
hours = [(dt.datetime(2019,3,9,23,0), dt.datetime(2019,3,10,22,0)),
(dt.datetime(2019,3,10,23,0), d.datetime(2019,3,11,22,0))]
The list has a variable length and I just need a boolean if datetime.now() is between the first and second element of any tuple in the list.
In NumPy I would do:
((start <= now) & (end >= now)).any()
what is the most efficient way to do this in a pythonic way? Sorry about the beginners question.
this works but I don't like the len():
from itertools import takewhile
len(list(takewhile(lambda x: x[0] <= now and now <= x[1], hours ))) > 0
any better suggestions?
any(map(lambda d: d[0] <= now <= d[1], hours))
any: Logical OR across all elements
map: runs a function on every element of the list
As #steff pointed out map is redundant, because we cause list enumeration directly.
any(d[0] <= now <= d[1] for d in hours)
It would be way better if we can avoid indexing into tuple and use tuple unpacking somehow (this was the reason I started with map)
A more verbose alternative. (But more readable in my eyes)
import datetime as dt
def in_time_ranges(ranges):
now = dt.datetime.now()
return any([r for r in ranges if now <= r[0] and r[1] >= now])
ranges1 = [(dt.datetime(2019, 3, 9, 23, 0), dt.datetime(2019, 3, 10, 22, 0)),
(dt.datetime(2019, 3, 10, 23, 0), dt.datetime(2019, 3, 11, 22, 0)),
(dt.datetime(2019, 4, 10, 23, 0), dt.datetime(2019, 5, 11, 22, 0))]
print(in_time_ranges(ranges1))
ranges2 = [(dt.datetime(2017, 3, 9, 14, 0), dt.datetime(2018, 3, 10, 22, 0)),
(dt.datetime(2018, 3, 10, 23, 0), dt.datetime(2018, 3, 11, 22, 0)),
(dt.datetime(2018, 4, 10, 23, 0), dt.datetime(2018, 5, 11, 22, 0))]
print(in_time_ranges(ranges2))
Output
True
False