I will use C# syntax since I am familiar with it, but it is not really language-specific.
Let's say we want to provide an API to go over a Tree and do something with each Node.
Solution 1: void Visit(Tree tree, Action<Node> action)
It takes a tree, and calls action on each node in the tree.
Solution 2: IEnumerable<Node> ToEnumerable(Tree tree)
It converts tree to a flat lazy sequence so we can go over and call action on each node.
Now, let's see how we can convert one API to another.
It is pretty trivial to provide Visit on top of ToEnumerable:
void Visit(Tree tree, Action<Node> action) {
ToEnumerable(tree).ForEach(action);
}
However, is there a concept/feature in any language that will allow to provide ToEnumerable on top of Visit (as lazy sequence, so list is not created in advance)?
Not sure if I understand you correctly, but in Python, you can create iterable interface on any object.
So you would just add special method __iter__ (which will yield nodes while traversing the tree).
The visit procedure is then just about iterating through Tree object and calling action on each node.
If you are writing the code that will visit each node (as with a tree), it's possible to have an iterator call iterators for each branch, and perform a yield return on leaf nodes. This approach will work, and is very simple, but has the serious disadvantage that it's very easy to end up with code that will be very readable but execute very slowly. Some other questions and answers on this site will offer insight as to how to traverse trees efficiently within an iterator.
If the "tree" was just an example, and what you really have is a class which exposes a routine to call some delegate upon each node (similar to List.ForEach()), but does not expose an IEnumerable, you may be able to use the former to produce a List, which you could then iterate. Use something like var myList = new List<someThing>(); myCollection.ForEach( (x) => myList.Add(x) ); and then you may be able to enumerate myList.
If even that isn't sufficient, because the objects that were added to the list may not be valid by the time enumeration is complete, it may in rare cases be possible to use multiple threading to accomplish what's needed. For example, if you have two sorted collections whose ForEach method prepares each items for use, does the specified action, and then cleans up each item before proceeding to the next, and if you need to interleave the actions on items from two independent collections, one might be able to iterate the collections on separate threads, and use synchronization primitives so each thread will wait as necessary for the other.
Note that collections which only expose themselves via ForEach method are apt to restrict access during the execution of such a ForEach (if such restriction weren't necessary, they would probably implement IEnumerable). It may be possible for the "item action" called by one ForEach to perform another ForEach on the same collection on the same thread, since the latter ForEach would have to complete before the former one could resume. While one ForEach is running, however, an attempt to call a ForEach on a second thread would likely either malfunction or wait for the first operation to complete. If the first ForEach was waiting for some action by the second, deadlock would result. Because of this, scenarios where multi-threading will work better than simply building a List are rare. Nonetheless, there are a few cases where it may be helpful (e.g. the above-mentioned "zipper" operation on independent collections).
I think now I understand the idea. The concept that I need here is called first-class continuations or, specifically, call/cc. The confusing thing about it for me is that C# already provides a limited implementation of this concept in yield return, but it is not applicable to my scenario.
So if C# provided full implementation, the solution would look like:
IEnumerable<Node> ToEnumerable(Tree tree) {
tree.Visit(node => magic yield return node);
}
where magic yield return instead of returning sequence from node => ... lambda returns next element from ToEnumerable.
However, this answer is still not complete as I do not see the exact correlation between yield return and call/cc. I will update the answer when I understand this.
Related
I read quite a lot about the visitor pattern and its supposed advantages. To me however it seems they are not that much advantages when applied in practice:
"Convenient" and "elegant" seems to mean lots and lots of boilerplate code
Therefore, the code is hard to follow. Also 'accept'/'visit' is not very descriptive
Even uglier boilerplate code if your programming language has no method overloading (i.e. Vala)
You cannot in general add new operations to an existing type hierarchy without modification of all classes, since you need new 'accept'/'visit' methods everywhere as soon as you need an operation with different parameters and/or return value (changes to classes all over the place is one thing this design pattern was supposed to avoid!?)
Adding a new type to the type hierarchy requires changes to all visitors. Also, your visitors cannot simply ignore a type - you need to create an empty visit method (boilerplate again)
It all just seems to be an awful lot of work when all you want to do is actually:
// Pseudocode
int SomeOperation(ISomeAbstractThing obj) {
switch (type of obj) {
case Foo: // do Foo-specific stuff here
case Bar: // do Bar-specific stuff here
case Baz: // do Baz-specific stuff here
default: return 0; // do some sensible default if type unknown or if we don't care
}
}
The only real advantage I see (which btw i haven't seen mentioned anywhere): The visitor pattern is probably the fastest method to implement the above code snippet in terms of cpu time (if you don't have a language with double dispatch or efficient type comparison in the fashion of the pseudocode above).
Questions:
So, what advantages of the visitor pattern have I missed?
What alternative concepts/data structures could be used to make the above fictional code sample run equally fast?
For as far as I have seen so far there are two uses / benefits for the visitor design pattern:
Double dispatch
Separate data structures from the operations on them
Double dispatch
Let's say you have a Vehicle class and a VehicleWasher class. The VehicleWasher has a Wash(Vehicle) method:
VehicleWasher
Wash(Vehicle)
Vehicle
Additionally we also have specific vehicles like a car and in the future we'll also have other specific vehicles. For this we have a Car class but also a specific CarWasher class that has an operation specific to washing cars (pseudo code):
CarWasher : VehicleWasher
Wash(Car)
Car : Vehicle
Then consider the following client code to wash a specific vehicle (notice that x and washer are declared using their base type because the instances might be dynamically created based on user input or external configuration values; in this example they are simply created with a new operator though):
Vehicle x = new Car();
VehicleWasher washer = new CarWasher();
washer.Wash(x);
Many languages use single dispatch to call the appropriate function. Single dispatch means that during runtime only a single value is taken into account when determining which method to call. Therefore only the actual type of washer we'll be considered. The actual type of x isn't taken into account. The last line of code will therefore invoke CarWasher.Wash(Vehicle) and NOT CarWasher.Wash(Car).
If you use a language that does not support multiple dispatch and you do need it (I can honoustly say I have never encountered such a situation though) then you can use the visitor design pattern to enable this. For this two things need to be done. First of all add an Accept method to the Vehicle class (the visitee) that accepts a VehicleWasher as a visitor and then call its operation Wash:
Accept(VehicleWasher washer)
washer.Wash(this);
The second thing is to modify the calling code and replace the washer.Wash(x); line with the following:
x.Accept(washer);
Now for the call to the Accept method the actual type of x is considered (and only that of x since we are assuming to be using a single dispatch language). In the implementation of the Accept method the Wash method is called on the washer object (the visitor). For this the actual type of the washer is considered and this will invoke CarWasher.Wash(Car). By combining two single dispatches a double dispatch is implemented.
Now to eleborate on your remark of the terms like Accept and Visit and Visitor being very unspecific. That is absolutely true. But it is for a reason.
Consider the requirement in this example to implement a new class that is able to repair vehicles: a VehicleRepairer. This class can only be used as a visitor in this example if it would inherit from VehicleWasher and have its repair logic inside a Wash method. But that ofcourse doesn't make any sense and would be confusing. So I totally agree that design patterns tend to have very vague and unspecific naming but it does make them applicable to many situations. The more specific your naming is, the more restrictive it can be.
Your switch statement only considers one type which is actually a manual way of single dispatch. Applying the visitor design pattern in the above way will provide double dispatch.
This way you do not necessarily need additional Visit methods when adding additional types to your hierarchy. Ofcourse it does add some complexity as it makes the code less readable. But ofcourse all patterns come at a price.
Ofcourse this pattern cannot always be used. If you expect lots of complex operations with multiple parameters then this will not be a good option.
An alternative is to use a language that does support multiple dispatch. For instance .NET did not support it until version 4.0 which introduced the dynamic keyword. Then in C# you can do the following:
washer.Wash((dynamic)x);
Because x is then converted to a dynamic type its actual type will be considered for the dispatch and so both x and washer will be used to select the correct method so that CarWasher.Wash(Car) will be called (making the code work correctly and staying intuitive).
Separate data structures and operations
The other benefit and requirement is that it can separate the data structures from the operations. This can be an advantage because it allows new visitors to be added that have there own operations while it also allows data structures to be added that 'inherit' these operations. It can however be only applied if this seperation can be done / makes sense. The classes that perform the operations (the visitors) do not know the structure of the data structures nor do they have to know that which makes code more maintainable and reusable. When applied for this reason the visitors have operations for the different elements in the data structures.
Say you have different data structures and they all consist of elements of class Item. The structures can be lists, stacks, trees, queues etc.
You can then implement visitors that in this case will have the following method:
Visit(Item)
The data structures need to accept visitors and then call the Visit method for each Item.
This way you can implement all kinds of visitors and you can still add new data structures as long as they consist of elements of type Item.
For more specific data structures with additional elements (e.g. a Node) you might consider a specific visitor (NodeVisitor) that inherits from your conventional Visitor and have your new data structures accept that visitor (Accept(NodeVisitor)). The new visitors can be used for the new data structures but also for the old data structures due to inheritence and so you do not need to modify your existing 'interface' (the super class in this case).
In my personal opinion, the visitor pattern is only useful if the interface you want implemented is rather static and doesn't change a lot, while you want to give anyone a chance to implement their own functionality.
Note that you can avoid changing everything every time you add a new method by creating a new interface instead of modifying the old one - then you just have to have some logic handling the case when the visitor doesn't implement all the interfaces.
Basically, the benefit is that it allows you to choose the correct method to call at runtime, rather than at compile time - and the available methods are actually extensible.
For more info, have a look at this article - http://rgomes-info.blogspot.co.uk/2013/01/a-better-implementation-of-visitor.html
By experience, I would say that "Adding a new type to the type hierarchy requires changes to all visitors" is an advantage. Because it definitely forces you to consider the new type added in ALL places where you did some type-specific stuff. It prevents you from forgetting one....
This is an old question but i would like to answer.
The visitor pattern is useful mostly when you have a composite pattern in place in which you build a tree of objects and such tree arrangement is unpredictable.
Type checking may be one thing that a visitor can do, but say you want to build an expression based on a tree that can vary its form according to a user input or something like that, a visitor would be an effective way for you to validate the tree, or build a complex object according to the items found on the tree.
The visitor may also carry an object that does something on each node it may find on that tree. this visitor may be a composite itself chaining lots of operations on each node, or it can carry a mediator object to mediate operations or dispatch events on each node.
You imagination is the limit of all this. you can filter a collection, build an abstract syntax tree out of an complete tree, parse a string, validate a collection of things, etc.
I have created a class that I've been using as the storage for all listings in my applications. The class allows me to "sign" an object to a listing (which can be created on the fly via the sign() method like so):
manager.sign(myObject, "someList");
This stores the index of the element (using it's unique id) in the newly created or previously created listing "someList" as well as the object in a 2D array. So for example, I might end up with this:
trace(_indexes["someList"][objectId]); // 0 - the object is the first in this list
trace(_instances["someList"]); // [object MyObject]
The class has another two methods:
find(signature:String):Array
This method returns an array via slice() containing all of the elements signed with the given signature.
findFirst(signature:String):Object
This method just returns the first object in a given listing
So to retrieve myObject I can either go:
trace(find("someList")[0]); or trace(findFirst("someList"));
Finally, there is an unsign() function which will remove an object from a given listing. This function basically:
Stores the result of pop() in the specified listing against a variable.
Uses the stored index to quickly replace the specified object with the pop()'d item.
Deletes the stored index for the specified object and updates the index for the pop()'d item.
Through all this, using unsign() will remove an object extremely quickly from a listing of any size.
Now this is all well and good, but I've had some thoughts which are making me consider how good this really is? I mean being able to easily list, remove and access lists of anything I want throughout the application like this is awesome - but is there a catch?
A couple of starting thoughts I have had are:
So far I haven't implemented support for listings that are private and only accessible via a given class.
Memory - this doesn't seem very memory efficient. Then again, neither is creating arrays for everything I want to store individually either. Just seems.. Larger.. Somehow.
Any insights?
I've uploaded the class here in case the above doesn't make much sense: https://projectavian.com/AviManager.as
Your solution seems pretty solid. If you're looking to modify it to be a bit more extensible and handle rights management, you might consider moving all those individually indexed properties to a value object for your AV elements. You could perform operations like "sign" and "unsign" internally in the VOs, or check for access rights. Your management class could monitor the collection of these VOs, pass them around, perform the method calls, and the objects would hold the state in a bit more readable format.
Really, though, this is entering into a coding style discussion. Your method works and it's not particularly inefficient. Just make sure the code is readable, encapsulated, and extensible and you're good.
I have a class that requires some of its methods to be called in a specific order. If these methods are called out of order then the object will stop working correctly. There are a few asserts in the methods to ensure that the object is in a valid state. What naming conventions could I use to communicate to the next person to read the code that these methods need to be called in a specific order?
It would be possible to turn this into one huge method, but huge methods are a great way to create problems. (There are a 2 methods than can trigger this sequence so 1 huge method would also result in duplication.)
It would be possible to write comments that explain that the methods need to be called in order but comments are less useful then clearly named methods.
Any suggestions?
Is it possible to refactor so (at least some of) the state from the first function is passed as a paramter to the second function, then it's impossible to avoid?
Otherwise, if you have comments and asserts, you're doing quite well.
However, "It would be possible to turn this into one huge method" makes it sound like the outside code doesn't need to access the intermediate state in any way. If so, why not just make one public method, which calls several private methods successively? Something like:
FroblicateWeazel() {
// Need to be in this order:
FroblicateWeazel_Init();
FroblicateWeazel_PerformCals();
FroblicateWeazel_OutputCalcs();
FroblicateWeazel_Cleanup();
}
That's not perfect, but if the order is centralised to that one function, it's fairly easy to see what order they should come in.
Message digest and encryption/decryption routines often have an _init() method to set things up, an _update() to add new data, and a _final() to return final results and tear things back down again.
One thing I've sometimes wondered is which is the better style out of the two shown below (if any)? Is it better to return immediately if a guard condition hasn't been satisfied, or should you only do the other stuff if the guard condition is satisfied?
For the sake of argument, please assume that the guard condition is a simple test that returns a boolean, such as checking to see if an element is in a collection, rather than something that might affect the control flow by throwing an exception. Also assume that methods/functions are short enough not to require editor scrolling.
// Style 1
public SomeType aMethod() {
SomeType result = null;
if (!guardCondition()) {
return result;
}
doStuffToResult(result);
doMoreStuffToResult(result);
return result;
}
// Style 2
public SomeType aMethod() {
SomeType result = null;
if (guardCondition()) {
doStuffToResult(result);
doMoreStuffToResult(result);
}
return result;
}
I prefer the first style, except that I wouldn't create a variable when there is no need for it. I'd do this:
// Style 3
public SomeType aMethod() {
if (!guardCondition()) {
return null;
}
SomeType result = new SomeType();
doStuffToResult(result);
doMoreStuffToResult(result);
return result;
}
Having been trained in Jackson Structured Programming in the late '80s, my ingrained philosophy was always "a function should have a single entry-point and a single exit-point"; this meant I wrote code according to Style 2.
In the last few years I have come to realise that code written in this style is often overcomplex and hard to read/maintain, and I have switched to Style 1.
Who says old dogs can't learn new tricks? ;)
Style 1 is what the Linux kernel indirectly recommends.
From https://www.kernel.org/doc/Documentation/process/coding-style.rst, chapter 1:
Now, some people will claim that having 8-character indentations makes
the code move too far to the right, and makes it hard to read on a
80-character terminal screen. The answer to that is that if you need
more than 3 levels of indentation, you're screwed anyway, and should fix
your program.
Style 2 adds levels of indentation, ergo, it is discouraged.
Personally, I like style 1 as well. Style 2 makes it harder to match up closing braces in functions that have several guard tests.
I don't know if guard is the right word here. Normally an unsatisfied guard results in an exception or assertion.
But beside this I'd go with style 1, because it keeps the code cleaner in my opinion. You have a simple example with only one condition. But what happens with many conditions and style 2? It leads to a lot of nested ifs or huge if-conditions (with || , &&). I think it is better to return from a method as soon as you know that you can.
But this is certainly very subjective ^^
Martin Fowler refers to this refactoring as :
"Replace Nested Conditional with Guard Clauses"
If/else statements also brings cyclomatic complexity. Hence harder to test cases. In order to test all the if/else blocks you might need to input lots of options.
Where as if there are any guard clauses, you can test them first, and deal with the real logic inside the if/else clauses in a clearer fashion.
If you dig through the .net-Framework using .net-Reflector you will see the .net programmers use style 1 (or maybe style 3 already mentioned by unbeli).
The reasons are already mentioned by the answers above. and maybe one other reason is to make the code better readable, concise and clear.
the most thing this style is used is when checking the input parameters, you always have to do this if you program a kind of frawework/library/dll.
first check all input parameters than work with them.
It sometimes depends on the language and what kinds of "resources" that you are using (e.g. open file handles).
In C, Style 2 is definitely safer and more convenient because a function has to close and/or release any resources that it obtained during execution. This includes allocated memory blocks, file handles, handles to operating system resources such as threads or drawing contexts, locks on mutexes, and any number of other things. Delaying the return until the very end or otherwise restricting the number of exits from a function allows the programmer to more easily ensure that s/he properly cleans up, helping to prevent memory leaks, handle leaks, deadlock, and other problems.
In C++ using RAII-style programming, both styles are equally safe, so you can pick one that is more convenient. Personally I use Style 1 with RAII-style C++. C++ without RAII is like C, so, again, Style 2 is probably better in that case.
In languages like Java with garbage collection, the runtime helps smooth over the differences between the two styles because it cleans up after itself. However, there can be subtle issues with these languages, too, if you don't explicitly "close" some types of objects. For example, if you construct a new java.io.FileOutputStream and do not close it before returning, then the associated operating system handle will remain open until the runtime garbage collects the FileOutputStream instance that has fallen out of scope. This could mean that another process or thread that needs to open the file for writing may be unable to until the FileOutputStream instance is collected.
Although it goes against best practices that I have been taught I find it much better to reduce the nesting of if statements when I have a condition such as this. I think it is much easier to read and although it exits in more than one place it is still very easy to debug.
I would say that Style1 became more used because is the best practice if you combine it with small methods.
Style2 look a better solution when you have big methods. When you have them ... you have some common code that you want to execute no matter how you exit. But the proper solution is not to force a single exit point but to make the methods smaller.
For example if you want to extract a sequence of code from a big method, and this method has two exit points you start to have problems, is hard to do it automatically. When i have a big method written in style1 i usually transform it in style2, then i extract methods then in each of them i should have Style1 code.
So Style1 is best but is compatible with small methods.
Style2 is not so good but is recommended if you have big methods that you don't want, have time to split.
I prefer to use method #1 myself, it is logically easier to read and also logically more similar to what we are trying to do. (if something bad happens, exit function NOW, do not pass go, do not collect $200)
Furthermore, most of the time you would want to return a value that is not a logically possible result (ie -1) to indicate to the user who called the function that the function failed to execute properly and to take appropriate action. This lends itself better to method #1 as well.
I would say "It depends on..."
In situations where I have to perform a cleanup sequence with more than 2 or 3 lines before leaving a function/method I would prefer style 2 because the cleanup sequence has to be written and modified only once. That means maintainability is easier.
In all other cases I would prefer style 1.
Number 1 is typically the easy, lazy and sloppy way. Number 2 expresses the logic cleanly. What others have pointed out is that yes it can become cumbersome. This tendency though has an important benefit. Style #1 can hide that your function is probably doing too much. It doesn't visually demonstrate the complexity of what's going on very well. I.e. it prevents the code from saying to you "hey this is getting a bit too complex for this one function". It also makes it a bit easier for other developers that don't know your code to miss those returns sprinkled here and there, at first glance anyway.
So let the code speak. When you see long conditions appearing or nested if statements it is saying that maybe it would be better to break this stuff up into multiple functions or that it needs to be rewritten more elegantly.
I've recently seen in a couple of different places comments along the lines of, "I learned about recursion in school, but have never used it or felt the need for it since then." (Recursion seems to be a popular example of "book learning" amongst a certain group of programmers.)
Well, it's true that in imperative languages such as Java and Ruby[1], we generally use iteration and avoid recursion, in part because of the risk of stack overflows, and in part because it's the style most programmers in those languages are used to.
Now I know that, strictly speaking, there are no "necessary" uses of recursion in such languages: one can always somehow replace recursion with iteration, no matter how complex things get. By "necessary" here, I'm talking about the following:
Can you think of any particular examples of code in such languages where recursion was so much better than iteration (for reasons of clarity, efficiency, or otherwise) that you used recursion anyway, and converting to iteration would have been a big loss?
Recursively walking trees has been mentioned several times in the answers: what was it exactly about your particular use of it that made recursion better than using a library-defined iterator, had it been available?
[1]: Yes, I know that these are also object-oriented languages. That's not directly relevant to this question, however.
There are no "necessary" uses of recursion. All recursive algorithms can be converted to iterative ones. I seem to recall a stack being necessary, but I can't recall the exact construction off the top of my head.
Practically speaking, if you're not using recursion for the following (even in imperative languages) you're a little mad:
Tree traversal
Graphs
Lexing/Parsing
Sorting
When you are walking any kind of tree structure, for example
parsing a grammar using a recursive-descent parser
walking a DOM tree (e.g. parsed HTML or XML)
also, every toString() method that calls the toString() of the object members can be considered recursive, too. All object serializing algorithms are recursive.
In my work recursion is very rarely used for anything algorithmic. Things like factorials etc are solved much more readably (and efficiently) using simple loops. When it does show up it is usually because you are processing some data that is recursive in nature. For example, the nodes on a tree structure could be processed recursively.
If you were to write a program to walk the nodes of a binary tree for example, you could write a function that processed one node, and called itself to process each of it's children. This would be more effective than trying to maintain all the different states for each child node as you looped through them.
The most well-known example is probably the quicksort algorithm developed by by C.A.R. Hoare.
Another example is traversing a directory tree for finding a file.
In my opinion, recursive algorithms are a natural fit when the data structure is also recursive.
def traverse(node, function):
function(this)
for each childnode in children:
traverse(childnode, function)
I can't see why I'd want to write that iteratively.
It's all about the data you are processing.
I wrote a simple parser to convert a string into a data structure, it's probably the only example in 5 years' work in Java, but I think it was the right way to do it.
The string looked like this:
"{ index = 1, ID = ['A', 'B', 'C'], data = {" +
"count = 112, flags = FLAG_1 | FLAG_2 }}"
The best abstraction for this was a tree, where all leaf nodes are primitive data types, and branches could be arrays or objects. This is the typical recursive problem, a non-recursive solution is possible but much more complex.
Recursion can always be rewritten as iteration with an external stack. However if you're sure that you don't risk very deep recursion that would lead to stackoverflow, recursion is a very convenient thing.
One good example is traversing a directory structure on a known operating system. You usually know how deep it can be (maximum path length is limited) and therefore will not have a stackoverflow. Doing the same via iteration with an external stack is not so convenient.
It was said "anything tree". I may be too cautious, and I know that stacks are big nowadays, but I still won't use recursion on a typical tree. I would, however, do it on a balanced tree.
I have a List of reports. I am using indexers on my class that contains this list. The reports are retrieved by their screen names using the indexers. In the indexer, if the report for that screen name doesn't exist it loads the report and recursively calls itself.
public class ReportDictionary
{
private static List<Report> _reportList = null;
public ReportColumnList this[string screenName]
{
get
{
Report rc = _reportList.Find(delegate(Report obj) { return obj.ReportName == screenName; });
if (rc == null)
{
this.Load(screenName);
return this[screenName]; // Recursive call
}
else
return rc.ReportColumnList.Copy();
}
private set
{
this.Add(screenName, value);
}
}
}
This can be done without recursion using some additional lines of code.