"Necessary" Uses of Recursion in Imperative Languages - language-agnostic

I've recently seen in a couple of different places comments along the lines of, "I learned about recursion in school, but have never used it or felt the need for it since then." (Recursion seems to be a popular example of "book learning" amongst a certain group of programmers.)
Well, it's true that in imperative languages such as Java and Ruby[1], we generally use iteration and avoid recursion, in part because of the risk of stack overflows, and in part because it's the style most programmers in those languages are used to.
Now I know that, strictly speaking, there are no "necessary" uses of recursion in such languages: one can always somehow replace recursion with iteration, no matter how complex things get. By "necessary" here, I'm talking about the following:
Can you think of any particular examples of code in such languages where recursion was so much better than iteration (for reasons of clarity, efficiency, or otherwise) that you used recursion anyway, and converting to iteration would have been a big loss?
Recursively walking trees has been mentioned several times in the answers: what was it exactly about your particular use of it that made recursion better than using a library-defined iterator, had it been available?
[1]: Yes, I know that these are also object-oriented languages. That's not directly relevant to this question, however.

There are no "necessary" uses of recursion. All recursive algorithms can be converted to iterative ones. I seem to recall a stack being necessary, but I can't recall the exact construction off the top of my head.
Practically speaking, if you're not using recursion for the following (even in imperative languages) you're a little mad:
Tree traversal
Graphs
Lexing/Parsing
Sorting

When you are walking any kind of tree structure, for example
parsing a grammar using a recursive-descent parser
walking a DOM tree (e.g. parsed HTML or XML)
also, every toString() method that calls the toString() of the object members can be considered recursive, too. All object serializing algorithms are recursive.

In my work recursion is very rarely used for anything algorithmic. Things like factorials etc are solved much more readably (and efficiently) using simple loops. When it does show up it is usually because you are processing some data that is recursive in nature. For example, the nodes on a tree structure could be processed recursively.
If you were to write a program to walk the nodes of a binary tree for example, you could write a function that processed one node, and called itself to process each of it's children. This would be more effective than trying to maintain all the different states for each child node as you looped through them.

The most well-known example is probably the quicksort algorithm developed by by C.A.R. Hoare.
Another example is traversing a directory tree for finding a file.

In my opinion, recursive algorithms are a natural fit when the data structure is also recursive.
def traverse(node, function):
function(this)
for each childnode in children:
traverse(childnode, function)
I can't see why I'd want to write that iteratively.

It's all about the data you are processing.
I wrote a simple parser to convert a string into a data structure, it's probably the only example in 5 years' work in Java, but I think it was the right way to do it.
The string looked like this:
"{ index = 1, ID = ['A', 'B', 'C'], data = {" +
"count = 112, flags = FLAG_1 | FLAG_2 }}"
The best abstraction for this was a tree, where all leaf nodes are primitive data types, and branches could be arrays or objects. This is the typical recursive problem, a non-recursive solution is possible but much more complex.

Recursion can always be rewritten as iteration with an external stack. However if you're sure that you don't risk very deep recursion that would lead to stackoverflow, recursion is a very convenient thing.
One good example is traversing a directory structure on a known operating system. You usually know how deep it can be (maximum path length is limited) and therefore will not have a stackoverflow. Doing the same via iteration with an external stack is not so convenient.

It was said "anything tree". I may be too cautious, and I know that stacks are big nowadays, but I still won't use recursion on a typical tree. I would, however, do it on a balanced tree.

I have a List of reports. I am using indexers on my class that contains this list. The reports are retrieved by their screen names using the indexers. In the indexer, if the report for that screen name doesn't exist it loads the report and recursively calls itself.
public class ReportDictionary
{
private static List<Report> _reportList = null;
public ReportColumnList this[string screenName]
{
get
{
Report rc = _reportList.Find(delegate(Report obj) { return obj.ReportName == screenName; });
if (rc == null)
{
this.Load(screenName);
return this[screenName]; // Recursive call
}
else
return rc.ReportColumnList.Copy();
}
private set
{
this.Add(screenName, value);
}
}
}
This can be done without recursion using some additional lines of code.

Related

Clojure practice - use functions of complex datatypes or their elements?

It is idiomatic in lisps such as Clojure to use simple data-structures and lots of functions. Still, there are many times when we must work with complex data-structures composed of many simpler ones.
My question is about a matter of good style/practice. In general, should we create functions that take the entire complex object, and within that extract what we need, or should they take exactly and only what they need?
For concreteness, I compare these two options in the following psuedo code
(defrecord Thing
[a b])
(defn one-option
[a]
.. a .. ))
(one-option (:a a-thing))
;; ==============
(defn another-option
[a-thing]
.. (:a a-thing) .. ))
The pros of one-option-f is that the function is simpler, and has fewer responsibilities. It is also more compositional - it can be used in more places.
The downside is that we may end up repeating ourselves. If, for example, we have many functions which transform one Thing into another Thing, many of which need use one-option, within each one we will find ourselves repeating the extraction code. Of course, another option is to create both, but this also adds a certain code overhead.
I think the answer is "It depends" and "It doesn't matter as much in Clojure as it does in object systems". Clojure has a number of largely orthogonal mechanisms designed to express modes of composition. The granularity of function arguments will tend to fall out of how the structure of the program is conceived of in the large.
So much for the hot air. There are some specifics that can affect argument granularity:
Since Clojure data structures are immutable, data hiding and access
functions/methods have little relevance. So the repetition caused by
accessing parts of a complex structure is trifling.
What appear in object-design as associations are rendered by small
collections of typed object pointers in each object. In Clojure,
these tend to become single(ton) global maps. There are a standard
Clojure functions for accessing and manipulating hierarchical
structures of this kind (get-in and assoc-in, for examples).
Where we are looking for dynamically-bound compliance with an
interface, Clojure protocols and datatypes are cleaner and more
adaptable than most object systems. In this case, the whole object is
passed.
Do both. To begin with, provide functions that transform the simplest structures, then add convenience functions for handling more complex structures which compose the functions that handle the simple ones.

Scala style: how far to nest functions?

One of the advantages of Scala is that it gives you great control over scope. You can nest
functions like this:
def fn1 = {
def fn11 = {
...
}
def fn12 = {
...
def fn121 = {
...
}
}
...
def fn13 = {
...
}
}
The problem here is that fn1 may start to look a bit intimidating. Coming from a Java background, we are advised to keep functions small enough to be viewed on a single "page" in the IDE.
What would you think about taking fn12 out of fn1 based on the reasoning: "It's only used in fn1 right now, but it might come in useful somewhere else in the class later on..."
Also, would you have a preference as to where to place the nested functions - before or after the code that calls them?
In general I don't see that much nesting of functions in real code. It runs against the ethos of keeping methods simple and concise. Such nesting is mainly useful for closures where you'll be using some of the parameters from the outer scope (e.g. the inner loop of a recursive function), so it's cleaner than declaring it outside and having to re-pass those arguments explicitly.
You have to place the nested functions before the code that calls them or it's a forward reference and won't compile. (In objects / classes you can place them after, but not in methods.)
There are a few patterns that take advantage of one layer of nesting.
Recursion, where it is used to hide implementation details (and is cleaner than separating into two separate top-level methods):
def callsRecursive(p: Param): Result = {
def recursive(p: Param, o: OtherParam, pr: PartialResult = default): Result = {
...
}
}
Scope-safe don't-repeat-yourself:
def doesGraphics(p: Point) {
def up(p: Point): Point = // ...
def dn(p: Point): Point = // ...
def lf(p: Point): Point = // ...
def rt(p: Point): Point = // ...
/* Lots of turtle-style drawing */
}
And more esoteric tricks like shadowing implicit conversions for a local block.
If you need both of these, I could envision nesting twice. More than that is likely overkill, mostly because you are probably making one method do too much. You should think about how to subdivide the problem with clean interfaces that can then become their own methods, rather than having a messy hodge-podge of closures around all sorts of variables defined within the method. Big methods are like global variables: everything becomes too dependent on the details of implementation and too hard to keep track of. If you're ready to do the appropriate amount of thinking to make something have a decent interface, even if you only need it once, then consider taking it out to the top level. If you aren't going to think that hard, my inclination is to leave it inside to avoid polluting the interface.
In any case, don't be afraid to create a method anywhere you need it. For example, suppose you find yourself deep within some method with two collections each of which have to have the same operation performed on them at specific points in the logic. Don't worry if you're one or two or three methods deep! Just create the method where it's needed, and call it instead of repeating yourself. (Just keep in mind that creating a list and mapping is an alternative if you simply need to process several things at the same place.)
If you have a top level function like the one you describe it is probably doing to much.
TDD helps as well in the decision if this is the case: Is still everything easily testable.
If I come to the conclusion that this is actually the case I refactor to get the inner functions out as dependencies, with their own tests.
In the end result I make very limited use of functions defined in functions defined ... I also put a much stricter limit on method size: about 10-15 lines in java, even less in scala, since it less verbose.
I put internal functions mostly at the top of the outer method, but it hardly matters since its so short anyway.
I consider it as a best practice to always use the lowest visibility. If a nested function is needed for a different function, it could be moved anyway.
That looks pretty scary indeed! If you want to fine-control the scope of private methods, you can declare them as
private[scope] def fn12 = { ... }
where scope is a package. You can read more in The busy Java developer's guide to Scala.
I personally avoid nesting named methods (def), whereas I don't mind nesting anonymous functions (e.g., closures in continuation-passing style programming).
Nested functions are useful (helpers in recursion for example). But if they get too numerous then there is nothing stopping you extracting them into a new type and delegating to that.

What language feature can allow transition from visitor to sequence?

I will use C# syntax since I am familiar with it, but it is not really language-specific.
Let's say we want to provide an API to go over a Tree and do something with each Node.
Solution 1: void Visit(Tree tree, Action<Node> action)
It takes a tree, and calls action on each node in the tree.
Solution 2: IEnumerable<Node> ToEnumerable(Tree tree)
It converts tree to a flat lazy sequence so we can go over and call action on each node.
Now, let's see how we can convert one API to another.
It is pretty trivial to provide Visit on top of ToEnumerable:
void Visit(Tree tree, Action<Node> action) {
ToEnumerable(tree).ForEach(action);
}
However, is there a concept/feature in any language that will allow to provide ToEnumerable on top of Visit (as lazy sequence, so list is not created in advance)?
Not sure if I understand you correctly, but in Python, you can create iterable interface on any object.
So you would just add special method __iter__ (which will yield nodes while traversing the tree).
The visit procedure is then just about iterating through Tree object and calling action on each node.
If you are writing the code that will visit each node (as with a tree), it's possible to have an iterator call iterators for each branch, and perform a yield return on leaf nodes. This approach will work, and is very simple, but has the serious disadvantage that it's very easy to end up with code that will be very readable but execute very slowly. Some other questions and answers on this site will offer insight as to how to traverse trees efficiently within an iterator.
If the "tree" was just an example, and what you really have is a class which exposes a routine to call some delegate upon each node (similar to List.ForEach()), but does not expose an IEnumerable, you may be able to use the former to produce a List, which you could then iterate. Use something like var myList = new List<someThing>(); myCollection.ForEach( (x) => myList.Add(x) ); and then you may be able to enumerate myList.
If even that isn't sufficient, because the objects that were added to the list may not be valid by the time enumeration is complete, it may in rare cases be possible to use multiple threading to accomplish what's needed. For example, if you have two sorted collections whose ForEach method prepares each items for use, does the specified action, and then cleans up each item before proceeding to the next, and if you need to interleave the actions on items from two independent collections, one might be able to iterate the collections on separate threads, and use synchronization primitives so each thread will wait as necessary for the other.
Note that collections which only expose themselves via ForEach method are apt to restrict access during the execution of such a ForEach (if such restriction weren't necessary, they would probably implement IEnumerable). It may be possible for the "item action" called by one ForEach to perform another ForEach on the same collection on the same thread, since the latter ForEach would have to complete before the former one could resume. While one ForEach is running, however, an attempt to call a ForEach on a second thread would likely either malfunction or wait for the first operation to complete. If the first ForEach was waiting for some action by the second, deadlock would result. Because of this, scenarios where multi-threading will work better than simply building a List are rare. Nonetheless, there are a few cases where it may be helpful (e.g. the above-mentioned "zipper" operation on independent collections).
I think now I understand the idea. The concept that I need here is called first-class continuations or, specifically, call/cc. The confusing thing about it for me is that C# already provides a limited implementation of this concept in yield return, but it is not applicable to my scenario.
So if C# provided full implementation, the solution would look like:
IEnumerable<Node> ToEnumerable(Tree tree) {
tree.Visit(node => magic yield return node);
}
where magic yield return instead of returning sequence from node => ... lambda returns next element from ToEnumerable.
However, this answer is still not complete as I do not see the exact correlation between yield return and call/cc. I will update the answer when I understand this.

Relevance of recursion

So I'm currently trying to grasp the concept of recursion, and I understand most of the problems that I've encountered, but I feel as though its use wouldn't be applicable to too many computing issues. This is just a novice's assumption though, so I'm asking, are there many practical uses for recursion as a programmer? And also, what typical problems can be solved with it? The only ones that I've seen are heap sort and brain teaser-type problems like "The Towers of Hanoi which just seems very specific and lacking broad use.
Thanks
There are a plethora of uses for recursion in programming - a classic example being navigating a tree structure, where you'd call the navigation function with each child element discovered, etc.
Here are some fields which would be almost impossible without recursion:
XML, HTML or any other tree like document structure
Compilation and parsing
Natural Language Processing
Divide and conquer algorithms
Many mathematical concepts, e.g. factorials
Recursion can lead to brilliantly elegant solutions to otherwise complex problems. If you're at all interested in programming as an art, you really should delve deeper.
Oh and if you're not sure, here's a solid definition of recursion:
Recursion (noun): See "Recursion"
It depends on what you're going to be doing I suppose. I probably write less than one recursive function a year as a C#/ASP.NET developer doing corporate web work. When I'm screwing around with my hobby code (mostly stat research) I find a lot more opportunities to apply recursion. Part of this is subject matter, part of it is that I'm much more reliant on 3rd party libraries that the client has already decided on when doing corporate work (where the algorithms needing recursion are implemented).
It's not something you use every day. But many algorithms about searching and sorting data can make use of it. In general, most recursive algorithms can also be written using iteration; oftentimes the recursive version is simpler.
If you check the questions which are listed as "Related" to this question, you will find a "plethora" of stuff about recursion that will help you to understand it better.
Recursion isn't something new, and it is not just a toy concept. Recursive algorithms have been around since before there were computers.
The classic definition of "factorial" is a prime example:
fact(x) =
if x < 0 then fact(x) is undefined
if x = 0 then fact(0) = 1
if x > 0 then fact(x) = x * fact(x-1)
This isn't something that was created by computer geeks who thought that recursion was a cool toy. This is the standard mathematical definition.
Call recursion, as a program construct, is something that should almost never be used except in extremely high-level languages where you expect the compiler to optimize it to a different construct. Use of call recursion, except when you can establish small bounds on the depth, leads to stack overflow, and not the good kind of Stack Overflow that answers your questions for you. :-)
Recursion as an algorithmic concept, on the other hand, is very useful. It's key to working with any recursively-defined data formats (like HTML or XML, or a hierarchical filesystem) as well as for implementing important algorithms in searching, sorting, and (everyone's favorite) graphics rendering, among countless other fields.
There are are several languages that don't support loops (ie. for and while), and as a result when you need repeating behavior, you need to use recursion(I believe that J does not have loops). In many examples, recursion requires much less code. For example, I wrote an isPrime method, it took only two lines of code.
public static boolean isPrime(int n)
{
return n!=1&&isPrime(n,2);
}
public static boolean isPrime(int n,int c)
{
return c==n||n%c!=0&&isPrime(n,c+1);
}
The iterative solution would take much more code:
public static boolean isPrime(int n)
{
if(n==1) return false;
int c=2;
while(c!=n)
{
if(n%c==0) return false;
}
return true;
}
Another good example is when you are working with ListNodes, for example if you would like to check if all the elements in a ListNode are the same, a recursive solution would be much easier.
public static <E> boolean allSame(ListNode<E> list)
{
return list.getNext()==null||list.getValue().equals(list.getNext().getValue())&&allSame(list.getNext());
}
The iterative solution would look something like this:
public static <E> boolean allSame(ListNode<E> list)
{
while(list.getNext()!=null)
{
if(!list.getValue().equals(list)) return false;
list=list.getNext();
}
return true;
}
As you can see, in most cases recursive solutions are shorter than iterative solutions.

Why all the functions from object oriented language allows to return only one value (General)

I am curious to know about this.
whenever I write a function which have to return multiple values, either I have to use pass by reference or create an array store values in it and pass them.
Why all the Object Orinented languages functions are not allowed to return multiple parameters as we pass them as input. Like is there anything inbuilt structure of the language which is restricting from doing this.
Dont you think it will be fun and easy if we are allowed to do so.
It's not true that all Object-Oriented languages follow this paradigm.
e.g. in Python (from here):
def quadcube (x):
return x**2, x**3
a, b = quadcube(3)
a will be 9 and b will be 27.
The difference between the traditional
OutTypeA SomeFunction(out OutTypeB, TypeC someOtherInputParam)
and your
{ OutTypeA, OutTypeB } SomeFunction(TypeC someOtherInputParam)
is just syntactic sugar. Also, the tradition of returning one single parameter type allows writing in the easy readable natural language of result = SomeFunction(...). It's just convenience and ease of use.
And yes, as others said, you have tuples in some languages.
This is likely because of the way processors have been designed and hence carried over to modern languages such as Java or C#. The processor can load multiple things (pointers) into parameter registers but only has one return value register that holds a pointer.
I do agree that not all OOP languages only support returning one value, but for the ones that "apparently" do, this I think is the reason why.
Also for returning a tuple, pair or struct for that matter in C/C++, essentially, the compiler is returning a pointer to that object.
First answer: They don't. many OOP languages allow you to return a tuple. This is true for instance in python, in C++ you have pair<> and in C++0x a fully fledged tuple<> is in TR1.
Second answer: Because that's the way it should be. A method should be short and do only one thing and thus can be argued, only need to return one thing.
In PHP, it is like that because the only way you can receive a value is by assigning the function to a variable (or putting it in place of a variable). Although I know array_map allows you to do return something & something;
To return multiple parameters, you return an single object that contains both of those parameters.
public MyResult GetResult(x)
{
return new MyResult { Squared = Math.Pow(x,2), Cubed = Math.Pow(x,3) };
}
For some languages you can create anonymous types on the fly. For others you have to specify a return object as a concrete class. One observation with OO is you do end up with a lot of little classes.
The syntactic niceties of python (see #Cowan's answer) are up to the language designer. The compiler / runtime could creating an anonymous class to hold the result for you, even in a strongly typed environment like the .net CLR.
Yes it can be easier to read in some circumstances, and yes it would be nice. However, if you read Eric Lippert's blog, you'll often read dialogue's and hear him go on about how there are many nice features that could be implemented, but there's a lot of effort that goes into every feature, and some things just don't make the cut because in the end they can't be justified.
It's not a restriction, it is just the architecture of the Object Oriented and Structured programming paradigms. I don't know if it would be more fun if functions returned more than one value, but it would be sure more messy and complicated. I think the designers of the above programming paradigms thought about it, and they probably had good reasons not to implement that "feature" -it is unnecessary, since you can already return multiple values by packing them in some kind of collection. Programming languages are designed to be compact, so usually unnecessary features are not implemented.