Managing updates to nested immutable data structures in functional languages - language-agnostic

I've noticed while on my quest to lean functional programming that there are cases when parameter lists start to become excessive when using nested immutable data structures. This is because when making an update to an object state, you need to update all the parent nodes in the data structure as well. Note that here I take "update" to mean "return a new immutable object with the appropriate change".
e.g. the kind of function I have found myself writing (Clojure example) is:
(defn update-object-in-world [world country city building object property value]
(update-country-in-world world
(update-city-in-country country
(update-building-in-city building
(update-object-in-building object property value)))))
All this to update one simple property is pretty ugly, but in addition the caller has to assemble all the parameters!
This must be a fairly common requirement when dealing with immutable data structures in functional languages generally, so is there a good pattern or trick to avoid this that I should be using instead?

Try
(update-in
world
[country city building]
(update-object-in-building object property value))

A classic general-purpose solution to this problem is what's called a "zipper" data structure. There are a number of variations, but the basic idea is simple: Given a nested data structure, take it apart as you traverse it, so that at each step you have a "current" element and a list of fragments representing how to reconstruct the rest of the data structure "above" the current element. A zipper can perhaps be thought of as a "cursor" that can move through an immutable data structure, replacing pieces as it goes, recreating only the parts it has to.
In the trivial case of a list, the fragments are just the previous elements of the list, stored in reverse order, and traversal is just moving the first element of one list to the other.
In the nontrivial but still simple case of a binary tree, the fragments each consist of a value and a subtree, identified as either right or left. Moving the zipper "down-left" involves adding to the fragment list the current element's value and right child, making the left child the new current element. Moving "down-right" works similarly, and moving "up" is done by combining the current element with the first value and subtree on the fragment list.
While the basic idea of the zipper is very general, constructing a zipper for a specific data structure usually requires some specialized bits, such as custom traversal or construction operations, to be used by a generic zipper implementation.
The original paper describing zippers (warning, PDF) gives example code in OCaml for an implementation storing fragments with an explicit path through a tree. Unsurprisingly, plenty of material can also be found on zippers in Haskell. As an alternative to constructing an explicit path and fragment list, zippers can be implemented in Scheme using continuations. And finally, there seems to even be a tree-oriented zipper provided by Clojure.

There are two approaches that I know of:
Collect multiple parameters in some sort of object that is convenient to pass around.
Example:
; world is a nested hash, the rest are keys
(defstruct location :world :country :city :building)
(defstruct attribute :object :property)
(defn do-update[location attribute value]
(let [{:keys [world country city building]} location
{:keys [object property]} attribute ]
(update-in world [country city building object property] value)))
This brings you down to two parameters that the caller needs to care about (location and attribute), which may be fair enough if those parameters do not change very often.
The other alternative is a with-X macro, which sets variables for use by the code body:
(defmacro with-location [location & body] ; run body in location context
(concat
(list 'let ['{:keys [world country city building] :as location} `~location])
`(~#body)))
Example use:
(with-location location (println city))
Then whatever the body does, it does to the world/country/city/building set for it, and it can pass the entire thing off to another function using the "pre-assembled" location parameter.
Update: Now with a with-location macro that actually works.

Related

Pascal binary search tree that contains linked lists

I need to search through its contents with a recursive function, so it returns a boolean response depending whether the value I read was found or not. I dunno how to make it work. Here's the type for the tree I defined:
text=string[30];
list=^nodeL;
nodeL=record
title:text;
ISBN:text;
next:list;
end;
tree=^nodeT;
nodeT=record
cod:text;
l:list;
LC:tree;
RC:tree;
end;
This looks like a "please do my assignment for me post", which I won't do. I will try and help you do the assignment yourself.
I don't know exactly what your assignment is, so I'm going to have to make some guesses.
I think your assignment is to write a recursive function that will search a tree and return a boolean response depending on whether a value (input to the function) is found or not.
I don't know how the tree gets its content. You say, you defined the tree type, so I'm guessing that means you are not provided with a tree that already has content. So, at least for testing purposes, you are going to have to write code to add content to the tree (so you can search it).
I don't know exactly what kind of tree you are supposed to create. Usually trees have rules about how the items are arranged in the tree. A common type of tree, is a binary tree, where for each node, the item in the left node (if present) is "less than" the item in the right node (if present). You probably need this when adding items (i.e. content) to the tree.
I think you need to change your definition of the tree node, nodeT (I could be wrong). A tree is a kind of linked list, it does not usually contain linked lists. Usually each tree node contains an item of data (not a list of items).
If I were doing in this assignment (and learning to program in Pascal) I would do the following (in this order):
Make sure I understand linked lists (at least singe-linked list). Write at least one program to add data to a linked list, and search
it (do not use recursion).
Make sure I understand recursion. Read some tutorials on recursion (that do not use linked lists, or trees). For example "First Textbook Examples of Recursion". Write at least one program that uses recursion (do not use linked lists or trees).
Make sure I understand trees. Read some tutorials on trees. For example, "Binary Search Trees"
Do the assignment.
P.S. You might want to change the name of your text type from "text", because, in Pascal, "text" is the name of a predefined type, for text files.

What are the actual advantages of the visitor pattern? What are the alternatives?

I read quite a lot about the visitor pattern and its supposed advantages. To me however it seems they are not that much advantages when applied in practice:
"Convenient" and "elegant" seems to mean lots and lots of boilerplate code
Therefore, the code is hard to follow. Also 'accept'/'visit' is not very descriptive
Even uglier boilerplate code if your programming language has no method overloading (i.e. Vala)
You cannot in general add new operations to an existing type hierarchy without modification of all classes, since you need new 'accept'/'visit' methods everywhere as soon as you need an operation with different parameters and/or return value (changes to classes all over the place is one thing this design pattern was supposed to avoid!?)
Adding a new type to the type hierarchy requires changes to all visitors. Also, your visitors cannot simply ignore a type - you need to create an empty visit method (boilerplate again)
It all just seems to be an awful lot of work when all you want to do is actually:
// Pseudocode
int SomeOperation(ISomeAbstractThing obj) {
switch (type of obj) {
case Foo: // do Foo-specific stuff here
case Bar: // do Bar-specific stuff here
case Baz: // do Baz-specific stuff here
default: return 0; // do some sensible default if type unknown or if we don't care
}
}
The only real advantage I see (which btw i haven't seen mentioned anywhere): The visitor pattern is probably the fastest method to implement the above code snippet in terms of cpu time (if you don't have a language with double dispatch or efficient type comparison in the fashion of the pseudocode above).
Questions:
So, what advantages of the visitor pattern have I missed?
What alternative concepts/data structures could be used to make the above fictional code sample run equally fast?
For as far as I have seen so far there are two uses / benefits for the visitor design pattern:
Double dispatch
Separate data structures from the operations on them
Double dispatch
Let's say you have a Vehicle class and a VehicleWasher class. The VehicleWasher has a Wash(Vehicle) method:
VehicleWasher
Wash(Vehicle)
Vehicle
Additionally we also have specific vehicles like a car and in the future we'll also have other specific vehicles. For this we have a Car class but also a specific CarWasher class that has an operation specific to washing cars (pseudo code):
CarWasher : VehicleWasher
Wash(Car)
Car : Vehicle
Then consider the following client code to wash a specific vehicle (notice that x and washer are declared using their base type because the instances might be dynamically created based on user input or external configuration values; in this example they are simply created with a new operator though):
Vehicle x = new Car();
VehicleWasher washer = new CarWasher();
washer.Wash(x);
Many languages use single dispatch to call the appropriate function. Single dispatch means that during runtime only a single value is taken into account when determining which method to call. Therefore only the actual type of washer we'll be considered. The actual type of x isn't taken into account. The last line of code will therefore invoke CarWasher.Wash(Vehicle) and NOT CarWasher.Wash(Car).
If you use a language that does not support multiple dispatch and you do need it (I can honoustly say I have never encountered such a situation though) then you can use the visitor design pattern to enable this. For this two things need to be done. First of all add an Accept method to the Vehicle class (the visitee) that accepts a VehicleWasher as a visitor and then call its operation Wash:
Accept(VehicleWasher washer)
washer.Wash(this);
The second thing is to modify the calling code and replace the washer.Wash(x); line with the following:
x.Accept(washer);
Now for the call to the Accept method the actual type of x is considered (and only that of x since we are assuming to be using a single dispatch language). In the implementation of the Accept method the Wash method is called on the washer object (the visitor). For this the actual type of the washer is considered and this will invoke CarWasher.Wash(Car). By combining two single dispatches a double dispatch is implemented.
Now to eleborate on your remark of the terms like Accept and Visit and Visitor being very unspecific. That is absolutely true. But it is for a reason.
Consider the requirement in this example to implement a new class that is able to repair vehicles: a VehicleRepairer. This class can only be used as a visitor in this example if it would inherit from VehicleWasher and have its repair logic inside a Wash method. But that ofcourse doesn't make any sense and would be confusing. So I totally agree that design patterns tend to have very vague and unspecific naming but it does make them applicable to many situations. The more specific your naming is, the more restrictive it can be.
Your switch statement only considers one type which is actually a manual way of single dispatch. Applying the visitor design pattern in the above way will provide double dispatch.
This way you do not necessarily need additional Visit methods when adding additional types to your hierarchy. Ofcourse it does add some complexity as it makes the code less readable. But ofcourse all patterns come at a price.
Ofcourse this pattern cannot always be used. If you expect lots of complex operations with multiple parameters then this will not be a good option.
An alternative is to use a language that does support multiple dispatch. For instance .NET did not support it until version 4.0 which introduced the dynamic keyword. Then in C# you can do the following:
washer.Wash((dynamic)x);
Because x is then converted to a dynamic type its actual type will be considered for the dispatch and so both x and washer will be used to select the correct method so that CarWasher.Wash(Car) will be called (making the code work correctly and staying intuitive).
Separate data structures and operations
The other benefit and requirement is that it can separate the data structures from the operations. This can be an advantage because it allows new visitors to be added that have there own operations while it also allows data structures to be added that 'inherit' these operations. It can however be only applied if this seperation can be done / makes sense. The classes that perform the operations (the visitors) do not know the structure of the data structures nor do they have to know that which makes code more maintainable and reusable. When applied for this reason the visitors have operations for the different elements in the data structures.
Say you have different data structures and they all consist of elements of class Item. The structures can be lists, stacks, trees, queues etc.
You can then implement visitors that in this case will have the following method:
Visit(Item)
The data structures need to accept visitors and then call the Visit method for each Item.
This way you can implement all kinds of visitors and you can still add new data structures as long as they consist of elements of type Item.
For more specific data structures with additional elements (e.g. a Node) you might consider a specific visitor (NodeVisitor) that inherits from your conventional Visitor and have your new data structures accept that visitor (Accept(NodeVisitor)). The new visitors can be used for the new data structures but also for the old data structures due to inheritence and so you do not need to modify your existing 'interface' (the super class in this case).
In my personal opinion, the visitor pattern is only useful if the interface you want implemented is rather static and doesn't change a lot, while you want to give anyone a chance to implement their own functionality.
Note that you can avoid changing everything every time you add a new method by creating a new interface instead of modifying the old one - then you just have to have some logic handling the case when the visitor doesn't implement all the interfaces.
Basically, the benefit is that it allows you to choose the correct method to call at runtime, rather than at compile time - and the available methods are actually extensible.
For more info, have a look at this article - http://rgomes-info.blogspot.co.uk/2013/01/a-better-implementation-of-visitor.html
By experience, I would say that "Adding a new type to the type hierarchy requires changes to all visitors" is an advantage. Because it definitely forces you to consider the new type added in ALL places where you did some type-specific stuff. It prevents you from forgetting one....
This is an old question but i would like to answer.
The visitor pattern is useful mostly when you have a composite pattern in place in which you build a tree of objects and such tree arrangement is unpredictable.
Type checking may be one thing that a visitor can do, but say you want to build an expression based on a tree that can vary its form according to a user input or something like that, a visitor would be an effective way for you to validate the tree, or build a complex object according to the items found on the tree.
The visitor may also carry an object that does something on each node it may find on that tree. this visitor may be a composite itself chaining lots of operations on each node, or it can carry a mediator object to mediate operations or dispatch events on each node.
You imagination is the limit of all this. you can filter a collection, build an abstract syntax tree out of an complete tree, parse a string, validate a collection of things, etc.

What language feature can allow transition from visitor to sequence?

I will use C# syntax since I am familiar with it, but it is not really language-specific.
Let's say we want to provide an API to go over a Tree and do something with each Node.
Solution 1: void Visit(Tree tree, Action<Node> action)
It takes a tree, and calls action on each node in the tree.
Solution 2: IEnumerable<Node> ToEnumerable(Tree tree)
It converts tree to a flat lazy sequence so we can go over and call action on each node.
Now, let's see how we can convert one API to another.
It is pretty trivial to provide Visit on top of ToEnumerable:
void Visit(Tree tree, Action<Node> action) {
ToEnumerable(tree).ForEach(action);
}
However, is there a concept/feature in any language that will allow to provide ToEnumerable on top of Visit (as lazy sequence, so list is not created in advance)?
Not sure if I understand you correctly, but in Python, you can create iterable interface on any object.
So you would just add special method __iter__ (which will yield nodes while traversing the tree).
The visit procedure is then just about iterating through Tree object and calling action on each node.
If you are writing the code that will visit each node (as with a tree), it's possible to have an iterator call iterators for each branch, and perform a yield return on leaf nodes. This approach will work, and is very simple, but has the serious disadvantage that it's very easy to end up with code that will be very readable but execute very slowly. Some other questions and answers on this site will offer insight as to how to traverse trees efficiently within an iterator.
If the "tree" was just an example, and what you really have is a class which exposes a routine to call some delegate upon each node (similar to List.ForEach()), but does not expose an IEnumerable, you may be able to use the former to produce a List, which you could then iterate. Use something like var myList = new List<someThing>(); myCollection.ForEach( (x) => myList.Add(x) ); and then you may be able to enumerate myList.
If even that isn't sufficient, because the objects that were added to the list may not be valid by the time enumeration is complete, it may in rare cases be possible to use multiple threading to accomplish what's needed. For example, if you have two sorted collections whose ForEach method prepares each items for use, does the specified action, and then cleans up each item before proceeding to the next, and if you need to interleave the actions on items from two independent collections, one might be able to iterate the collections on separate threads, and use synchronization primitives so each thread will wait as necessary for the other.
Note that collections which only expose themselves via ForEach method are apt to restrict access during the execution of such a ForEach (if such restriction weren't necessary, they would probably implement IEnumerable). It may be possible for the "item action" called by one ForEach to perform another ForEach on the same collection on the same thread, since the latter ForEach would have to complete before the former one could resume. While one ForEach is running, however, an attempt to call a ForEach on a second thread would likely either malfunction or wait for the first operation to complete. If the first ForEach was waiting for some action by the second, deadlock would result. Because of this, scenarios where multi-threading will work better than simply building a List are rare. Nonetheless, there are a few cases where it may be helpful (e.g. the above-mentioned "zipper" operation on independent collections).
I think now I understand the idea. The concept that I need here is called first-class continuations or, specifically, call/cc. The confusing thing about it for me is that C# already provides a limited implementation of this concept in yield return, but it is not applicable to my scenario.
So if C# provided full implementation, the solution would look like:
IEnumerable<Node> ToEnumerable(Tree tree) {
tree.Visit(node => magic yield return node);
}
where magic yield return instead of returning sequence from node => ... lambda returns next element from ToEnumerable.
However, this answer is still not complete as I do not see the exact correlation between yield return and call/cc. I will update the answer when I understand this.

"Necessary" Uses of Recursion in Imperative Languages

I've recently seen in a couple of different places comments along the lines of, "I learned about recursion in school, but have never used it or felt the need for it since then." (Recursion seems to be a popular example of "book learning" amongst a certain group of programmers.)
Well, it's true that in imperative languages such as Java and Ruby[1], we generally use iteration and avoid recursion, in part because of the risk of stack overflows, and in part because it's the style most programmers in those languages are used to.
Now I know that, strictly speaking, there are no "necessary" uses of recursion in such languages: one can always somehow replace recursion with iteration, no matter how complex things get. By "necessary" here, I'm talking about the following:
Can you think of any particular examples of code in such languages where recursion was so much better than iteration (for reasons of clarity, efficiency, or otherwise) that you used recursion anyway, and converting to iteration would have been a big loss?
Recursively walking trees has been mentioned several times in the answers: what was it exactly about your particular use of it that made recursion better than using a library-defined iterator, had it been available?
[1]: Yes, I know that these are also object-oriented languages. That's not directly relevant to this question, however.
There are no "necessary" uses of recursion. All recursive algorithms can be converted to iterative ones. I seem to recall a stack being necessary, but I can't recall the exact construction off the top of my head.
Practically speaking, if you're not using recursion for the following (even in imperative languages) you're a little mad:
Tree traversal
Graphs
Lexing/Parsing
Sorting
When you are walking any kind of tree structure, for example
parsing a grammar using a recursive-descent parser
walking a DOM tree (e.g. parsed HTML or XML)
also, every toString() method that calls the toString() of the object members can be considered recursive, too. All object serializing algorithms are recursive.
In my work recursion is very rarely used for anything algorithmic. Things like factorials etc are solved much more readably (and efficiently) using simple loops. When it does show up it is usually because you are processing some data that is recursive in nature. For example, the nodes on a tree structure could be processed recursively.
If you were to write a program to walk the nodes of a binary tree for example, you could write a function that processed one node, and called itself to process each of it's children. This would be more effective than trying to maintain all the different states for each child node as you looped through them.
The most well-known example is probably the quicksort algorithm developed by by C.A.R. Hoare.
Another example is traversing a directory tree for finding a file.
In my opinion, recursive algorithms are a natural fit when the data structure is also recursive.
def traverse(node, function):
function(this)
for each childnode in children:
traverse(childnode, function)
I can't see why I'd want to write that iteratively.
It's all about the data you are processing.
I wrote a simple parser to convert a string into a data structure, it's probably the only example in 5 years' work in Java, but I think it was the right way to do it.
The string looked like this:
"{ index = 1, ID = ['A', 'B', 'C'], data = {" +
"count = 112, flags = FLAG_1 | FLAG_2 }}"
The best abstraction for this was a tree, where all leaf nodes are primitive data types, and branches could be arrays or objects. This is the typical recursive problem, a non-recursive solution is possible but much more complex.
Recursion can always be rewritten as iteration with an external stack. However if you're sure that you don't risk very deep recursion that would lead to stackoverflow, recursion is a very convenient thing.
One good example is traversing a directory structure on a known operating system. You usually know how deep it can be (maximum path length is limited) and therefore will not have a stackoverflow. Doing the same via iteration with an external stack is not so convenient.
It was said "anything tree". I may be too cautious, and I know that stacks are big nowadays, but I still won't use recursion on a typical tree. I would, however, do it on a balanced tree.
I have a List of reports. I am using indexers on my class that contains this list. The reports are retrieved by their screen names using the indexers. In the indexer, if the report for that screen name doesn't exist it loads the report and recursively calls itself.
public class ReportDictionary
{
private static List<Report> _reportList = null;
public ReportColumnList this[string screenName]
{
get
{
Report rc = _reportList.Find(delegate(Report obj) { return obj.ReportName == screenName; });
if (rc == null)
{
this.Load(screenName);
return this[screenName]; // Recursive call
}
else
return rc.ReportColumnList.Copy();
}
private set
{
this.Add(screenName, value);
}
}
}
This can be done without recursion using some additional lines of code.

Is it any way to implement a linked list with indexed access too?

I'm in the need of sort of a linked list structure, but if it had indexed access too it would be great.
Is it any way to accomplish that?
EDIT: I'm writing in C, but it may be for any language.
One method of achieving your goal is to implement a random or deterministic skip list. On the bottom level - you have your linked list with your items.
In order to get to elements using indexes, you'll need to add information to the inner nodes - of how many nodes are in the low most level, from this node until the next node on this level. This information can be added and maintained in O(logn).
This solution complexity is:
Add, Remove, Go to index, all work in O(logn).
The down side of this solution is that it is much more difficult to implement than the regular linked list. So using a regular linked list, you get Add, Remove in O(1), and Go to index in O(n).
You can probably use a tree for what you are aiming at. Make a binary tree that maintains the weights of each node of the tree (where the weight is equal to the number of nodes attached to that node, including itself). If you have a balancing scheme available for the tree, then insertions are still O(log n), since you only need to add one to the ancestor nodes' weights. Getting a node by index is O(log n), since you need only look at the indices of the ancestors of your desired node and the two children of each of those ancestors.
For achieving array like indexing in languages like C++, Java, Python, one would have to overload the array indexing operator [] for a class which implements the linked list data structure. The implementation would be O(n). In C since operator overloading is not possible, hence one would have to write a function which takes the linked list data structure and a position and returns the corresponding object.
In case a faster order access is required, one would have to use a different data structure like the BTree suggested by jprete or a dynamic array (which automatically grows as and when new elements are added to it). A quick example would be std::vector in C++ standard library.
SQL server row items in the clustered index are arranged like so:
.
/ \
/\ /\
*-*-*-*
The linked list is in the leaves (*-*-*). The linked list is ordered allowing fast directional scanning, and the tree serves as a `road-map' into the linked-list. So you would need a key-value pair for your items and then a data structure that encapsulates the tree and linked list.
so your data structure might look something like this:
struct ll_node
{
kv_pair current;
ll_node * next;
};
struct tree_node
{
value_type value;
short isLeaf;
union
{
tree_node * left_child;
kv_pair * left_leaf;
}
union
{
tree_node * right_child;
kv_pair * right_leaf
}
};
struct indexed_ll
{
tree_node * tree_root;
ll_node * linked_list_tail;
};