Finding shortest path between N sets in a directed unweighted graph - igraph

I want to find an algorithm for the following problem:
Given a directed graph G with starting point s and N sets of nodes, I need to find a path that starts in the start node and then goes to at least one node of every set.
Not all of the nodes are in the sets- only some of them.
There are no nodes that are in more than one set.
For example:
starting point: [n1]
First set: [n5,n6,n8,n9]
Seconed set: [n2,n10]
Third set: [n4,n7]
Forth set: [n3]
Other nodes: [m1..m7]
Showing the graph edges:
In this case a shortest path will be:
[n1,m3,m4,m5,n3,m6,n2,m7,n9]
[n1,m1,n7,m5,n3,m6,n2,m7,n9]
This problem is similar to the generalization of the Traveling salesman problem (wikipedia), but in this case, the graph is directional and unweighted. Also, we have a known start point, we have more than just the nodes in the sets and we can walk on the same node more than once.
I tried starting with a lazy fast way of searching for the closest node from the start to all of the nodes in all of the sets with Dijkstra and then removing the set from the calculation and continue searching for the next closest node from all sets from that point.
It is easy to notice that this way doesn't give the shortest path, but the real problem is that sometimes it falls to a dead-end and returns "no path" even when there is a path.
A naive way I can see that can be used is to create a function to calculate for every specific permutation of sets and use an algorithm like the one suggested here for a close but easier problem.
It will take O(|N|! * 2^(the number of all nodes in all the N sets))
Adding the code of the lazy way (sorry for the mess).
from Queue import Queue
from my_utils import RGBtoHex
from igraph import *
from random import choice
class path_info:
def __init__(self, way, length_of_way, satisfied_goal):
self.way = way
self.length_of_way = length_of_way
self.satisfied_goal = satisfied_goal
class graph_actions():
# Goal point has a "start" node and all other goal points are saved in a dictionary
def __init__(self, graph, goal_points):
self.graph = graph
self.goal_points = goal_points
self.best_path = list()
def calculate_best_path(self):
self.best_path = list()
queued_best_path = scatter(self.graph, self.goal_points)
while not queued_best_path.empty():
self.best_path.append(queued_best_path.get())
return self.best_path
def scatter(graph, goal_points):
best_path = Queue()
starting_point = goal_points["start"]
del goal_points["start"]
paths_dict = dict()
return find_best_way(best_path, goal_points, graph, paths_dict, starting_point)
def find_best_way(best_path, goal_points, graph, paths_dict, starting_point):
last = -1
while len(goal_points) > 0:
vertex_color = known_colors[choice(known_colors.keys())]
for vertex in goal_points.keys():
# Find shortest paths from vertex
# Color vertex
graph.vs.find(str(vertex).encode('ascii', 'replace'))["Fill Color"] = RGBtoHex(vertex_color)
# Find shortest path if exist with igraph function
way = graph.get_shortest_paths(starting_point, vertex)
if len(way[0]) != 0:
if (last != -1):
# Taking out the first node that is the end of the last route taken
way[0].pop(0)
paths_dict[vertex] = path_info(way, len(way[0]), goal_points[vertex])
# Find the closest node
stage = min(paths_dict, key=lambda x: paths_dict[x].length_of_way)
# Transfer from list to queue
for step in paths_dict[stage].way[0]:
best_path.put(step)
last = step
# Delete the other nodes with of the same group from the goal list
for key, value in goal_points.items():
if paths_dict.has_key(stage) and value == paths_dict[stage].satisfied_goal:
del goal_points[key]
if key != stage:
del paths_dict[key]
del paths_dict[stage]
# Saving the last node of this route
starting_point = last
graph.save("states_graph_color.graphml", "graphml")
return best_path
Is there a way that is better than the naive? If not, can I use a heuristic way to do this?
I also wonder if igraph has a trick to this in a more clean way.
Even if only the naive way is possible, I'm not sure how to implement it.

Related

Trying to define a function that creates lists from files and uses random.choices to choose an element from the weighted lists

I'm trying to define a function that will create lists from multiple text files and print a random element from one of the weighted lists. I've managed to get the function to work with random.choice for a single list.
enter code here
def test_rollitems():
my_commons = open('common.txt')
all_common_lines = my_commons.readlines()
common = []
for i in all_common_lines:
common.append(i)
y = random.choice(common)
print(y)
When I tried adding a second list to the function it wouldn't work and my program just closes when the function is called.
enter code here
def Improved_rollitem():
#create the lists from the files#
my_commons = open('common.txt')
all_common_lines= my_commons.readlines()
common = []
for i in all_common_lines:
common.append(i)
my_uncommons = open('uncommon.txt')
all_uncommon_lines =my_uncommons.readlines()
uncommon =[]
for i in all_uncommon_lines:
uncommon.apend(i)
y = random.choices([common,uncommon], [80,20])
print(y)
Can anyone offer any insight into what I'm doing wrong or missing ?
Nevermind. I figured this out on my own! Was having issues with Geany so I installed Pycharm and was able to work through the issue. Correct code is:
enter code here
def Improved_rollitem():
#create the lists from the files#
my_commons = open('common.txt')
all_common_lines= my_commons.readlines()
common = []
for i in all_common_lines:
common.append(i)
my_uncommons = open('uncommon.txt')
all_uncommon_lines =my_uncommons.readlines()
uncommon =[]
for i in all_uncommon_lines:
uncommon.append(i)
y = random.choices([common,uncommon], [.8,.20])
if y == [common]:
for i in [common]:
print(random.choice(i))
if y == [uncommon]:
for i in [uncommon]:
print(random.choice(i))
If there's a better way to do something like this, it would certainly be cool to know though.

scraping - find the last 5 score of each match - in html

i would like your help to get the last 5 score, i can't get it please help me.
from selenium import webdriver
import pandas as pd
from pandas import ExcelWriter
from openpyxl.workbook import Workbook
import time as t
import xlsxwriter
pd.set_option('display.max_rows', 5, 'display.max_columns', None, 'display.width', None)
browser = webdriver.Firefox()
browser.get('https://www.mismarcadores.com/futbol/espana/laliga/resultados/')
print("Current Page Title is : %s" %browser.title)
aux_ids = browser.find_elements_by_css_selector('.event__match.event__match--static.event__match--oneLine')
ids=[]
i = 0
for aux in aux_ids:
if i < 1:
ids.append( aux.get_attribute('id') )
i+=1
data=[]
for idt in ids:
id_clean = idt.split('_')[-1]
browser.execute_script("window.open('');")
browser.switch_to.window(browser.window_handles[1])
browser.get(f'https://www.mismarcadores.com/partido/{id_clean}/#h2h;overall')
t.sleep(5)
p_ids = browser.find_elements_by_css_selector('h2h-wrapper')
#here the code of the last 5 score of each match
I believe you can use your Firefox browser but have not tested with it. I use chrome so if you want to use chromedriver check the version of your browser and download the right one, also add it to your system path. The only thing with this approach is that it open a browser window until the page is loaded (because we are waiting for the javascript to generate the matches data). If you need anything else let me know. Good luck!
https://chromedriver.chromium.org/downloads
Known issues: Sometimes it will throw index out of range when retrieve matches data. This is something I am looking to it because it look like sometimes the xpath on each link change a little .
from selenium import webdriver
from lxml import html
from lxml.html import HtmlElement
def test():
# Here we specified the urls to for testing purpose
urls = ['https://www.mismarcadores.com/partido/noIPZ3Lj/#h2h;overall'
]
# a loop to go over all the urls
for url in urls:
# We will print the string and format it with the url we are currently checking, Also we will print the
# result of the function get_last_5(url) where url is the current url in the for loop.
print("Scores after this match {u}".format(u=url), get_last_5(url))
def get_last_5(url):
print("processing {u}, please wait...".format(u=url))
# here we get a instance of the webdriver
browser = webdriver.Chrome()
# now we pass the url we want to get
browser.get(url)
# in this variable, we will "store" the html data as a string. We get it from here because we need to wait for
# the page to load and execute their javascript code in order to generate the matches data.
innerHTML = browser.execute_script("return document.body.innerHTML")
# Now we will assign this to a variable of type HtmlElement
tree: HtmlElement = html.fromstring(innerHTML)
# the following variables: first_team,second_team,match_date and rows are obtained via xpath method(). To get the
# xpath go to chrome browser,open it and load one of the url to check the DOM. Now if you wish to check the xpath
# of each of this variables (elements in case of html), right click on the element->click inspect->the inspect
# panel will appear->the clicked element wil appear selected on the inspect panel->right click on it->Copy->Copy
# Xpath. first_team,second_team and match_date are obtained from the "title" section. Rows are obtained from the
# table of last matches in the tbody content
# When using xpath it will return a list of HtmElement because it will try to find all the elements that match our
# xpath, so that is why we use [0] (to get the first element of the list). This will give use access to a
# HtmlElement object so now we can access its text attribute.
first_team = tree.xpath('//*[#id="flashscore"]/div[1]/div[1]/div[2]/div/div/a')[0].text
print((type(first_team)))
second_team = tree.xpath('//*[#id="flashscore"]/div[1]/div[3]/div[2]/div/div/a')[0].text
# [0:8] is used to slice the string because in the title it contains also the time of the match ie.(10.08.2020
# 13:00) . To use it for comparing each row we need only (10.08.20), so we get from position 0, 8 characters ([0:8])
match_date = tree.xpath('//*[#id="utime"]')[0].text[0:8]
# when getting the first element with [0], we get a HtmlElement object( which is the "table" that have all matches
# data). so we want to get all the children of it, which are all the "rows(elements)" inside it. getchildren()
# will also return a list of object of type HtmlElement. In this case we are also slicing the list with [:-1]
# because the last element inside the "table" is the button "Mostar mas partidos", so we want to take that out.
rows = tree.xpath('//*[#id="tab-h2h-overall"]/div[1]/table/tbody')[0].getchildren()[:-1]
# we quit the browser since we do not need this anymore, we could do it after assigning innerHtml, but no harm
# doing it here unless you wish to close it before doing all this assignment of variables.
browser.quit()
# this match_position variable will be the position of the match we currently have in the title.
match_position = None
# Now we will iterate over the rows and find the match. range(len(rows)) is just to get the count of rows to know
# until when to stop iterating.
for i in range(len(rows)):
# now we use the is_match function with the following parameter: first_team,second team, match_date and the
# current row which is row[i]. if the function return true we found the match position and we assign (i+1) to
# the match_position variable. i+1 because we iterate from 0.
if is_match(first_team, second_team, match_date, rows[i]):
match_position = i + 1
# now we stop the for no need to go further when we find it.
break
# Since we only want the following 5 matches score, we need to check if we have 5 rows beneath our match. If
# adding 5 from the match position is less than the number of rows then we can do it, if not we will only get the
# rows beneath it(maybe 0,1,2,3 or 4 rows)
if (match_position + 5) < len(rows):
# Again we are slicing the list, in this case 2 times [match_position:] (take out all the rows before the
# match position), then from the new list obtained from that we do [:5] which is start from the 0 position
# and stop on 5 [start:stop]. we use rows=rows beacause when slicing you get a new list so you can not do
# rows[match_position:][:5] you need to assign it to a variable. I am using same variable but you can assign
# it to a new one if you wish.
rows = rows[match_position:][:5]
else:
# since we do not have enough rows, just get the rows beneath our position.
rows = rows[match_position:len(rows)]
# Now to get the list of scores we are using a list comprehension in here but I will explain it as a for loop.
# Before that, you need to know that each row(<tr> element in html) has 6 td elements inside it, the number 5 is
# the score of the match. then inside each "score element" we have a span element and then a strong element,
# something like
# <tr>
# <td></td>
# <td></td>
# <td></td>
# <td></td>
# <td><span><strong>1:2</strong></span></td>.
# <td></td>
# </tr>
# Now, That been said, since each row is a HtmlElement object , we can go in a for loop as following:
scores = []
for row in rows:
data = row.getchildren()[4].getchildren()[0].text_content()
# not the best way but we will get al the text content on the element, in this case the span element,
# if the string has more than 5 characters i.e. "1 : 2" then we will take as if it is i.e. "1 : 2(0 : 1)". So
# in this case we want to slice it from the 2nd character from right to left and get 5 characters from that
# position.
# using a ternary expression here, if the length of the string is equal to 5 then this is our score,
# if not then we have to slice it and get the last part, from -6 which is the white space before then 2 (in
# our example) to -1 (which is the 1 before the last ')' ).
score = data if len(data) == 5 else data[-6:-1]
scores.append(score)
print("finished processing {u}.".format(u=url))
# now we return the scores
return scores
def is_match(t1, t2, match_date, row):
# from each row we want to compare, t1,t2,match_date (this are obtained from the title) with the rows team1,
# team2 and date. Each row has 6 element inside it. Please read all the code on get_last_5 before reading this
# explanation. so the for this row, date is in position 0, team1 in 2, team2 in 3.
# <td><span>10.03.20</span></td>
date = row.getchildren()[0].getchildren()[0].text
# <td><span>TeamName</span></td> (when the team lost) or
# <td><span><strong>TeamName</strong></span></td> (when the team won)
team1element = row.getchildren()[2].getchildren()[0] # this is the span element
# using a ternary expression (condition_if_true if condition else condition_if_false)
# https://book.pythontips.com/en/latest/ternary_operators.html
# if span element have childrens , (getchildren()>0) then the team name is team1element.getchildren()[0].text
# which is the text of the strong element, if not the jsut get the text from the span element.
mt1 = team1element.getchildren()[0].text if len(team1element.getchildren()) > 0 else team1element.text
# repeat the same as team 1
team2element = row.getchildren()[3].getchildren()[0]
mt2 = team2element.getchildren()[0].text if len(team2element.getchildren()) > 0 else team2element.text
# basically we can compare only the date, but jsut to be sure we compare the names also. So, if the dates and the
# names are the same this is our match row.
if match_date == date and t1 == mt1 and t2 == mt2:
# we found it so return true
return True
# if not the same then return false
return False

How to get dataset into array

I have worked all the tutorials and searched for "load csv tensorflow" but just can't get the logic of it all. I'm not a total beginner, but I don't have much time to complete this, and I've been suddenly thrown into Tensorflow, which is unexpectedly difficult.
Let me lay it out:
Very simple CSV file of 184 columns that are all float numbers. A row is simply today's price, three buy signals, and the previous 180 days prices
close = tf.placeholder(float, name='close')
signals = tf.placeholder(bool, shape=[3], name='signals')
previous = tf.placeholder(float, shape=[180], name = 'previous')
This article: https://www.tensorflow.org/guide/datasets
It covers how to load pretty well. It even has a section on changing to numpy arrays, which is what I need to train and test the 'net. However, as the author says in the article leading to this Web page, it is pretty complex. It seems like everything is geared toward doing data manipulation, where we have already normalized our data (nothing has really changed in AI since 1983 in terms of inputs, outputs, and layers).
Here is a way to load it, but not in to Numpy and no example of not manipulating the data.
with tf.Session as sess:
sess.run( tf.global variables initializer())
with open('/BTC1.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter =',')
line_count = 0
for row in csv_reader:
?????????
line_count += 1
I need to know how to get the csv file in to the
close = tf.placeholder(float, name='close')
signals = tf.placeholder(bool, shape=[3], name='signals')
previous = tf.placeholder(float, shape=[180], name = 'previous')
so that I can follow the tutorials to train and test the net.
It's not that clear for me your question. You might be answering, tell me if I'm wrong, how to feed data in your model? There are several fashions to do so.
Use placeholders with feed_dict during the session. This is the basic and easier one but often suffers from training performance issue. Further explanation, check this post.
Use queue. Hard to implement and badly documented, I don't suggest, because it's been taken over by the third method.
tf.data API.
...
So to answer your question by the first method:
# get your array outside the session
with open('/BTC1.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter =',')
dataset = np.asarray([data for data in csv_reader])
close_col = dataset[:, 0]
signal_cols = dataset[:, 1: 3]
previous_cols = dataset[:, 3:]
# let's say you load 100 row each time for training
batch_size = 100
# define placeholders like you
...
with tf.Session() as sess:
...
for i in range(number_iter):
start = i * batch_size
end = (i + 1) * batch_size
sess.run(train_operation, feed_dict={close: close_col[start: end, ],
signals: signal_col[start: end, ],
previous: previous_col[start: end, ]
}
)
By the third method:
# retrieve your columns like before
...
# let's say you load 100 row each time for training
batch_size = 100
# construct your input pipeline
c_col, s_col, p_col = wrapper(filename)
batch = tf.data.Dataset.from_tensor_slices((close_col, signal_col, previous_col))
batch = batch.shuffle(c_col.shape[0]).batch(batch_size) #mix data --> assemble batches --> prefetch to RAM and ready inject to model
iterator = batch.make_initializable_iterator()
iter_init_operation = iterator.initializer
c_it, s_it, p_it = iterator.get_next() #get next batch operation automatically called at each iteration within the session
# replace your close, signal, previous placeholder in your model by c_it, s_it, p_it when you define your model
...
with tf.Session() as sess:
# you need to initialize the iterators
sess.run([tf.global_variable_initializer, iter_init_operation])
...
for i in range(number_iter):
start = i * batch_size
end = (i + 1) * batch_size
sess.run(train_operation)
Good luck!

PyTorch Experience Replay with multiple inputs

I am running a modified version of the pyTorch deep Q tutorial which I have modified to pass in my own data rather than gym, and one additional input (two inputs in total)
Currently I am generating an individual state for each 'column' of inputs (not sure if this is the correct way though), when trying to pass the second input into my experience replay function it is returning:
__new__() takes 5 positional arguments but 7 were given
Code for expReplay():
class ReplayMemory(object):
def __init__(self, capacity):
self.capacity = capacity
self.memory = []
self.position = 0
def push(self, *args):
"""Saves a transition."""
if len(self.memory) < self.capacity:
self.memory.append(None)
self.memory[self.position] = Transition(*args)
self.position = (self.position + 1) % self.capacity
def sample(self, batch_size):
return random.sample(self.memory, batch_size)
def __len__(self):
return len(self.memory)
And my Triggering of the function:
memory.push(state,rsistate, action, next_state, next_rsi_state, reward)
If anyone has any examples of experience replay using multiple inputs please fire away! <3
My mistake, this was solved by modifying the Transition named tuple to include the additional inputs. Any information on multi input expreplay is still welcome.

Build network with shortcut using torch

I now have a network with 2 inputs X and Y.
X concatenates Y and then pass to network to get result1. And at the same time X will concat result1 as a shortcut.
It's easy if there is only one input.
branch = nn.Sequential()
branch:add(....) --some layers
net = nn.Sequential()
net:add(nn.ConcatTable():add(nn.Identity()):add(branch))
net:add(...)
But when it comes to two inputs I don't actually know how to do it? Besides, nngraph is not allowed.Does any one know how to do it?
You can use the table modules, have a look at this page: https://github.com/torch/nn/blob/master/doc/table.md
net = nn.Sequential()
triple = nn.ParallelTable()
duplicate = nn.ConcatTable()
duplicate:add(nn.Identity())
duplicate:add(nn.Identity())
triple:add(duplicate)
triple:add(nn.Identity())
net:add(triple)
net:add(nn.FlattenTable())
-- at this point the network transforms {X,Y} into {X,X,Y}
separate = nn.ConcatTable()
separate:add(nn.SelectTable(1))
separate:add(nn.NarrowTable(2,2))
net:add(separate)
-- now you get {X,{X,Y}}
parallel_XY = nn.ParallelTable()
parallel_XY:add(nn.Identity()) -- preserves X
parallel_XY:add(...) -- whatever you want to do from {X,Y}
net:add(parallel)
parallel_Xresult = nn.ParallelTable()
parallel_Xresult:add(...) -- whatever you want to do from {X,result}
net:add(parallel_Xresult)
output = net:forward({X,Y})
The idea is to start with {X,Y}, to duplicate X and to do your operations. This is clearly a bit complicated, nngraph is supposed to be here to do that.