Searching in graphs trees with Depth/Breadth first/A* algorithms - language-agnostic

I have a couple of questions about searching in graphs/trees:
Let's assume I have an empty chess board and I want to move a pawn around from point A to B.
A. When using depth first search or breadth first search must we use open and closed lists ? This is, a list that has all the elements to check, and other with all other elements that were already checked? Is it even possible to do it without having those lists? What about A*, does it need it?
B. When using lists, after having found a solution, how can you get the sequence of states from A to B? I assume when you have items in the open and closed list, instead of just having the (x, y) states, you have an "extended state" formed with (x, y, parent_of_this_node) ?
C. State A has 4 possible moves (right, left, up, down). If I do as first move left, should I let it in the next state come back to the original state? This, is, do the "right" move? If not, must I transverse the search tree every time to check which states I've been to?
D. When I see a state in the tree where I've already been, should I just ignore it, as I know it's a dead end? I guess to do this I'd have to always keep the list of visited states, right?
E. Is there any difference between search trees and graphs? Are they just different ways to look at the same thing?

A. When using depth first search or
breadth first search must we use open
and closed lists ?
With DFS you definitely need to store at least the current path. Otherwise you would not be able to backtrack. If you decide upon maintaining a list of all visited (closed) nodes, you are able to detect and avoid cycles (expanding the same node more than once). On the other side you don't have the space efficiency of DFS anymore. DFS without closed list only needs space proportional to the depth of the search space.
With BFS you need to maintain an open list (sometimes called fringe). Otherwise the algorithm simply can't work. When you additionally maintain a closed list, you will (again) be able to detect/avoid cycles. With BFS the additional space for the closed list might be not that bad, since you have to maintain the fringe anyway. The relation between fringe size and closed list size strongly depends upon the structure of the search space, so this has to be considered here. E.g. for a branching factor of 2, both lists are equal in size and the impact of having the closed list doesn't seem very bad compared to its benefits.
What about A*, does it need it?
A*, as it can be seen as some special (informed) type of BFS, needs the open list. Omitting the closed list is more delicate than with BFS; also deciding upon updating costs inside the closed list. Depending upon those decisions, the algorithm can stop being optimal and/or complete depending on the type of heuristic used, etc. I won't go into details here.
B.
Yup, the closed list should form some kind of inverse tree (pointers going towards the root node), so you can extract the solution path. You usually need the closed list for doing this. For DFS, your current stack is exactly the solution path (no need for closed list here). Also note that sometimes you are not interested in the path but only in the solution or the existence of it.
C.
Read previous answers and look for the parts which talk about the detection of cycles.
D.
To avoid cycles with a closed list: don't expand nodes that are already inside the closed list. Note: with path-costs coming into play (remember A*), things might get more tricky.
E. Is there any difference between
search trees and graphs?
You could consider searches that maintain a closed list to avoid cycles as graph-searches and those without one tree-searches.

A) It's possible to avoid the open/closed lists - you could try all possible paths, but that would take a VERY long time.
B) Once you've reached the goal, you use the parent_of_this_node information to "walk backwards" from the goal. Start with the goal, get its parent, get the parent's parent, etc. until you reach the start.
C) I think it doesn't matter - there's no way that the step you describe would result in a shorter path (unless your steps have negative weight, in which case you can't use Dijkstra/A*). In my A* code, I check for this case and ignore it, but do whatever is easiest to code up.
D) It depends - I believe Dijkstra can never reopen the same node (can someone correct me on that?). A* definitely can revisit a node - if you find a shorter path to the same node, you keep that path, otherwise you ignore it.
E) Not sure, I've never done anything specifically for trees myself.
There's a good introduction to A* here:
http://theory.stanford.edu/~amitp/GameProgramming/
that covers a lot of details about how to implement the open set, pick a heuristic, etc.

A. Open and Closed lists are common implementation details, not part of the algorithm as such. It's common to do a depth-first tree search without either of these for example, the canonical way being a recursive traversal of the tree.
B. It is typical to ensure that nodes refer back to previous nodes allowing for a plan to be reconstructed by following the back-links. Alternatively you just store the entire solution so far in each candidate, though it would then be misleading to call it a node really.
C. I'm assuming that moving left and then moving right bring you to an equivalent state - in this case, you would have already explored the original state, it would be on the closed list, and therefore should not have been put back onto the open list. You don't traverse the search tree each time because you keep a closed list - often implemented as an O(1) structure - for precisely this purpose of knowing which states have already been fully examined. Note that you cannot always assume that being in the same position is the same as being in the same state - for most game path-finding purposes, it is, but for general purpose search, it is not.
D. Yes, the list of visited states is what you're calling the closed list. You also want to check the open list to ensure you're not planning to examine a given state twice. You don't need to search any tree as such, since you typically store these things in linear structures. The algorithm as a whole is searching a tree (or a graph), and it generates a tree (of nodes representing the state space) but you don't explicitly search through a tree structure at any point within the algorithm.
E. A tree is a type of graph with no cycles/loops in it. Therefore you use the same graph search procedure to search either. It's common to generate a tree structure that represents your search through the graph, which is represented implicitly by the backwards links from each node to the node that preceded/generated it in the search. (Although if you go down the route of holding the entire plan in each state, there will be no tree, just a list of partial solutions.)

Related

Why do we only put duplicate nodes of a binary search tree in either the right or left subtree, but not both?

A test question I had on an exam asked why do we only put duplicate nodes of the root of a binary search tree in either the right or left subtree, but not both, so why is just putting duplicates of the root anywhere bad, what is the good thing that happens from selecting either right or left?
The question didn't give elaboration or a diagram, this was a final exam and there must've been just an implicit assumption that this was a common concept or practice. I still got a B on this exam but I missed this one and still don't understand why you can't just put any value anywhere regardless of duplicates.
why we only put duplicate nodes of the root of a binary search tree in either the right or left subtree?
I suppose you mean duplicate values, which get stored in separate nodes.
The algorithm for storing a new value in a binary search tree needs to decide whether to go left or right. The choice is unique when the value to store is different from the root's value. When it's less, it should be stored in the left subtree, when it's greater, it should be stored in the right subtree.
However, when it is equal, it becomes a matter of choice: both options are fine. So in practice an implementation will in that case always choose left, or will always choose right -- that's just the simplest way to implement it.
why is just putting duplicates of the root anywhere bad
No-one says that is bad, but consider the following:
if insertion has to be done sometimes in the left subtree, sometimes in the right one, then you need to introduce some additional algorithm to drive that choice, e.g. a random number generation, which is just going to slow down the process.
if ever you need to count the number of duplicates of a given value in the tree, then it will be easier when you know they are all on one side of the root. Otherwise you need to traverse in two directions to find them all. That is also possible, but leads to be bit more code.
Still, if the binary search tree has additional mechanics to keep it well balanced, like for example AVL or red-black trees have, then duplicate values can end up at either side of the root through rotation. For instance, if your binary search tree is an AVL tree, and the insertion algorithm is such that equal values are always inserted in the right subtree, then see what inserting the value 6 in the following tree does:
5 5 5
\ insertion \ rotation / \
5 =======> 5 =======> 5 6
\
6 (inserted)
... and so the duplicate value is suddenly at the left side.

Difference between RDF Containers and Collections?

I have read from a book
The difference between containers and collections lies in the fact that containers are always open (i.e., new members may be added through additional RDF statements) and collections may be closed.
I don't understand this difference clearly. It says that no new members can be added to a collection. What if I change the value of the last rdf:rest property from rdf:nil to _:xyz and add
_:xyz rdf:first <ex:aaa> .
_:xyz rdf:rest rdf:nil .
I am thus able to add a new member _:xyz. Why does it then say that collections are closed?
The key difference is that in a Container, you can simply continue to add new items, by only asserting new RDF triples. In a Collection, you first have to remove a statement before you can add new items.
This is an important difference in particular for RDF reasoning. It's important because RDF reasoning employs an Open World Assumption (OWA), which, put simply, states that just because a certain fact is not known, that does not mean we can assume that fact to be untrue.
If you apply this principle to a container, and you ask the question "how many items does the container have", the answer must always be "I don't know", simply because there is no way to determine how many unknown items might be in the container. However, if we have a collection, we have an explicit marker for the last item, so we can with certainty say how many items the collection contains - there can be no unknown additional items.

Lock free doubly linked skip list

There exists tons of research on lock-free doubly linked list. Likewise, there is tons of reserach on lock-free skip lists. As best I can tell, however, nobody has managed a lock free doubly linked skip list. Does anybody know of any research to the contrary, or a reason why this is the case?
Edit:
The specific scenario is for building a fast quantile (50%, 75%, etc) accumulator. Samples are inserted into the skip list in O(log n) time. By maintaining an iterator to the current quantile, we can compare the inserted value to the current quantile in O(1) time, and can easily determine whether the inserted value is to the left or right of the quantile, and by how much the quantile needs to move as a result. It's the left move that requires a previous pointer.
As I understand it, any difficulty will come from keeping the previous pointers consistent in the face of multiple threads inserting and removing at once. I imagine the solution will almost certainly involve a clever use of pointer marking.
But why would you do such a thing? I've not actually sat down and worked out exacty how skip lists work, but from my vague understanding, you'd never use the previous pointers. So why have the overhead of maintaining them?
But if you wanted to, I don't see why you cannot. Just replace the singly linked list with a doubly linked list. The doubly linked list is logically coherent, so it's all the same.
I have an idea for you. We use a "cursor" to find the item in a skiplist. The cursor also maintains the trail that was taken to get to the item. We use this trail for delete and insert - it avoids a second search to perform those operations, and it embeds the version # of the list that was seen when the traversal was made. I am wondering if you could use the cursor to more quickly find the previous item.
You would have to go up a level on the cursor and then search for the item that is just barely less than your item. Alternatively, if the search made it to the lowest level of the linked list, just save the prev ptr as you traverse. The lowest level is probably used 50% of the time to find your item, so performance would be decent.
Hmm... thinking about it now, it seems that the cursor would 50% of the time have the prev ptr, 25% of the time need to search again from 1 level up, 12.% 2 levels up, etc. So in infrequent cases, you have to almost do the search entirely again.
I think the advantage to this would be that you don't have to figure out how to "lock free" maintain a double linked skip list, and for the majority of cases you dramatically decrease the cost of locating the previous item.
As an alternative to maintaining backlinks, when a quantile needs to be updated, you could do another search to find the node whose key is less than the current one. As I also just mentioned in a comment to johnnycrash, it's possible to build an array of the rightmost node found at each level -- and from that it would be possible to accelerate the second search. (Fomitchev's thesis mentions this as a possible optimization.)

Why can you only prepend to lists in functional languages?

I have only used 3 functional languages -- scala, erlang, and haskell, but in all 3 of these, the correct way to build lists is to prepend the new data to the front and then reversing it instead of just appending to the end. Of course, you could append to the lists, but that results in an entirely new list being constructed.
Why is this? I could imagine it's because lists are implemented internally as linked lists, but why couldn't they just be implemented as doubly linked lists so you could append to the end with no penalty? Is there some reason all functional languages have this limitation?
Lists in functional languages are immutable / persistant.
Appending to the front of an immutable list is cheap because you only have to allocate a single node with the next pointer pointing to the head of the previous list. There is no need to change the original list since it's only a singly linked list and pointers to the previous head cannot see the update.
Adding a node to the end of the list necessitates modifying the last node to point to the newly created node. Only this is not possible because the node is immutable. The only option is to create a new node which has the same value as the last node and points to the newly created tail. This process must repeat itself all the way up to the front of the list resulting in a brand new list which is a copy of the first list with the exception of thetail node. Hence more expensive.
Because there is no way to append to a list in O(1) without modifying the original (which you don't do in functional languages)
Because it's faster
They certainly could support appending, but it's so much faster to prepend that they limit the API. It's also kind of non-functional to append, as you must then modify the last element or create a whole new list. Prepend works in an immutable, functional, style by its nature.
That is the way in which lists are defined. A list is defined as a linked list terminated by a nil, this is not just an implementation detail. This, coupled with that these languages have immutable data, at least erlang and haskell do, means that you cannot implement them as doubly linked lists. Adding an element would them modify the list, which is illegal.
By restricting list construction to prepending, it means that anybody else that is holding a reference to some part of the list further down, will not see it unexpectedly change behind their back. This allows for efficient list construction while retaining the property of immutable data.

Pathing in a non-geographic environment

For a school project, I need to create a way to create personnalized queries based on end-user choices.
Since the user can choose basically any fields from any combination of tables, I need to find a way to map the tables in order to make a join and not have extraneous data (This may lead to incoherent reports, but we're willing to live with that).
For up to two tables, I already managed to design an algorithm that works fine. However, when I add another table, I can't find a way to path through my database. All tables available for the personnalized reports can be linked together so it really all falls down to finding which path to use.
You might be able to try some form of an A* algorithm. Basically this looks at each of the possible next options to choose and applies a heuristic to it, a function that determines roughly how far it is between this node and your goal. It then chooses the one that is closer and repeats. The hardest part of implementing A* is designing a good heuristic.
Without more information on how the tables fit together, or what you mean by a 'path' through the tables, it's hard to recommend something though.
Looks like it didn't like my link, probably the * in it, try:
http://en.wikipedia.org/wiki/A*_search_algorithm
Edit:
If that is the whole database, I'd go with a depth-first exhaustive search.
I thought about using A* or a similar algorithm, but as you said, the hardest part is about designing the heuristic.
My tables are centered around somewhat of a backbone with quite a few branches each leading to at most a single leaf node. Here is the actual map (table names removed because I'm paranoid). Assuming I want to view data from tha A, B and C tables, I need an algorithm to find the blue path.