Scala style: how far to nest functions? - function

One of the advantages of Scala is that it gives you great control over scope. You can nest
functions like this:
def fn1 = {
def fn11 = {
...
}
def fn12 = {
...
def fn121 = {
...
}
}
...
def fn13 = {
...
}
}
The problem here is that fn1 may start to look a bit intimidating. Coming from a Java background, we are advised to keep functions small enough to be viewed on a single "page" in the IDE.
What would you think about taking fn12 out of fn1 based on the reasoning: "It's only used in fn1 right now, but it might come in useful somewhere else in the class later on..."
Also, would you have a preference as to where to place the nested functions - before or after the code that calls them?

In general I don't see that much nesting of functions in real code. It runs against the ethos of keeping methods simple and concise. Such nesting is mainly useful for closures where you'll be using some of the parameters from the outer scope (e.g. the inner loop of a recursive function), so it's cleaner than declaring it outside and having to re-pass those arguments explicitly.
You have to place the nested functions before the code that calls them or it's a forward reference and won't compile. (In objects / classes you can place them after, but not in methods.)

There are a few patterns that take advantage of one layer of nesting.
Recursion, where it is used to hide implementation details (and is cleaner than separating into two separate top-level methods):
def callsRecursive(p: Param): Result = {
def recursive(p: Param, o: OtherParam, pr: PartialResult = default): Result = {
...
}
}
Scope-safe don't-repeat-yourself:
def doesGraphics(p: Point) {
def up(p: Point): Point = // ...
def dn(p: Point): Point = // ...
def lf(p: Point): Point = // ...
def rt(p: Point): Point = // ...
/* Lots of turtle-style drawing */
}
And more esoteric tricks like shadowing implicit conversions for a local block.
If you need both of these, I could envision nesting twice. More than that is likely overkill, mostly because you are probably making one method do too much. You should think about how to subdivide the problem with clean interfaces that can then become their own methods, rather than having a messy hodge-podge of closures around all sorts of variables defined within the method. Big methods are like global variables: everything becomes too dependent on the details of implementation and too hard to keep track of. If you're ready to do the appropriate amount of thinking to make something have a decent interface, even if you only need it once, then consider taking it out to the top level. If you aren't going to think that hard, my inclination is to leave it inside to avoid polluting the interface.
In any case, don't be afraid to create a method anywhere you need it. For example, suppose you find yourself deep within some method with two collections each of which have to have the same operation performed on them at specific points in the logic. Don't worry if you're one or two or three methods deep! Just create the method where it's needed, and call it instead of repeating yourself. (Just keep in mind that creating a list and mapping is an alternative if you simply need to process several things at the same place.)

If you have a top level function like the one you describe it is probably doing to much.
TDD helps as well in the decision if this is the case: Is still everything easily testable.
If I come to the conclusion that this is actually the case I refactor to get the inner functions out as dependencies, with their own tests.
In the end result I make very limited use of functions defined in functions defined ... I also put a much stricter limit on method size: about 10-15 lines in java, even less in scala, since it less verbose.
I put internal functions mostly at the top of the outer method, but it hardly matters since its so short anyway.

I consider it as a best practice to always use the lowest visibility. If a nested function is needed for a different function, it could be moved anyway.

That looks pretty scary indeed! If you want to fine-control the scope of private methods, you can declare them as
private[scope] def fn12 = { ... }
where scope is a package. You can read more in The busy Java developer's guide to Scala.
I personally avoid nesting named methods (def), whereas I don't mind nesting anonymous functions (e.g., closures in continuation-passing style programming).

Nested functions are useful (helpers in recursion for example). But if they get too numerous then there is nothing stopping you extracting them into a new type and delegating to that.

Related

Programming style: should you return early if a guard condition is not satisfied?

One thing I've sometimes wondered is which is the better style out of the two shown below (if any)? Is it better to return immediately if a guard condition hasn't been satisfied, or should you only do the other stuff if the guard condition is satisfied?
For the sake of argument, please assume that the guard condition is a simple test that returns a boolean, such as checking to see if an element is in a collection, rather than something that might affect the control flow by throwing an exception. Also assume that methods/functions are short enough not to require editor scrolling.
// Style 1
public SomeType aMethod() {
SomeType result = null;
if (!guardCondition()) {
return result;
}
doStuffToResult(result);
doMoreStuffToResult(result);
return result;
}
// Style 2
public SomeType aMethod() {
SomeType result = null;
if (guardCondition()) {
doStuffToResult(result);
doMoreStuffToResult(result);
}
return result;
}
I prefer the first style, except that I wouldn't create a variable when there is no need for it. I'd do this:
// Style 3
public SomeType aMethod() {
if (!guardCondition()) {
return null;
}
SomeType result = new SomeType();
doStuffToResult(result);
doMoreStuffToResult(result);
return result;
}
Having been trained in Jackson Structured Programming in the late '80s, my ingrained philosophy was always "a function should have a single entry-point and a single exit-point"; this meant I wrote code according to Style 2.
In the last few years I have come to realise that code written in this style is often overcomplex and hard to read/maintain, and I have switched to Style 1.
Who says old dogs can't learn new tricks? ;)
Style 1 is what the Linux kernel indirectly recommends.
From https://www.kernel.org/doc/Documentation/process/coding-style.rst, chapter 1:
Now, some people will claim that having 8-character indentations makes
the code move too far to the right, and makes it hard to read on a
80-character terminal screen. The answer to that is that if you need
more than 3 levels of indentation, you're screwed anyway, and should fix
your program.
Style 2 adds levels of indentation, ergo, it is discouraged.
Personally, I like style 1 as well. Style 2 makes it harder to match up closing braces in functions that have several guard tests.
I don't know if guard is the right word here. Normally an unsatisfied guard results in an exception or assertion.
But beside this I'd go with style 1, because it keeps the code cleaner in my opinion. You have a simple example with only one condition. But what happens with many conditions and style 2? It leads to a lot of nested ifs or huge if-conditions (with || , &&). I think it is better to return from a method as soon as you know that you can.
But this is certainly very subjective ^^
Martin Fowler refers to this refactoring as :
"Replace Nested Conditional with Guard Clauses"
If/else statements also brings cyclomatic complexity. Hence harder to test cases. In order to test all the if/else blocks you might need to input lots of options.
Where as if there are any guard clauses, you can test them first, and deal with the real logic inside the if/else clauses in a clearer fashion.
If you dig through the .net-Framework using .net-Reflector you will see the .net programmers use style 1 (or maybe style 3 already mentioned by unbeli).
The reasons are already mentioned by the answers above. and maybe one other reason is to make the code better readable, concise and clear.
the most thing this style is used is when checking the input parameters, you always have to do this if you program a kind of frawework/library/dll.
first check all input parameters than work with them.
It sometimes depends on the language and what kinds of "resources" that you are using (e.g. open file handles).
In C, Style 2 is definitely safer and more convenient because a function has to close and/or release any resources that it obtained during execution. This includes allocated memory blocks, file handles, handles to operating system resources such as threads or drawing contexts, locks on mutexes, and any number of other things. Delaying the return until the very end or otherwise restricting the number of exits from a function allows the programmer to more easily ensure that s/he properly cleans up, helping to prevent memory leaks, handle leaks, deadlock, and other problems.
In C++ using RAII-style programming, both styles are equally safe, so you can pick one that is more convenient. Personally I use Style 1 with RAII-style C++. C++ without RAII is like C, so, again, Style 2 is probably better in that case.
In languages like Java with garbage collection, the runtime helps smooth over the differences between the two styles because it cleans up after itself. However, there can be subtle issues with these languages, too, if you don't explicitly "close" some types of objects. For example, if you construct a new java.io.FileOutputStream and do not close it before returning, then the associated operating system handle will remain open until the runtime garbage collects the FileOutputStream instance that has fallen out of scope. This could mean that another process or thread that needs to open the file for writing may be unable to until the FileOutputStream instance is collected.
Although it goes against best practices that I have been taught I find it much better to reduce the nesting of if statements when I have a condition such as this. I think it is much easier to read and although it exits in more than one place it is still very easy to debug.
I would say that Style1 became more used because is the best practice if you combine it with small methods.
Style2 look a better solution when you have big methods. When you have them ... you have some common code that you want to execute no matter how you exit. But the proper solution is not to force a single exit point but to make the methods smaller.
For example if you want to extract a sequence of code from a big method, and this method has two exit points you start to have problems, is hard to do it automatically. When i have a big method written in style1 i usually transform it in style2, then i extract methods then in each of them i should have Style1 code.
So Style1 is best but is compatible with small methods.
Style2 is not so good but is recommended if you have big methods that you don't want, have time to split.
I prefer to use method #1 myself, it is logically easier to read and also logically more similar to what we are trying to do. (if something bad happens, exit function NOW, do not pass go, do not collect $200)
Furthermore, most of the time you would want to return a value that is not a logically possible result (ie -1) to indicate to the user who called the function that the function failed to execute properly and to take appropriate action. This lends itself better to method #1 as well.
I would say "It depends on..."
In situations where I have to perform a cleanup sequence with more than 2 or 3 lines before leaving a function/method I would prefer style 2 because the cleanup sequence has to be written and modified only once. That means maintainability is easier.
In all other cases I would prefer style 1.
Number 1 is typically the easy, lazy and sloppy way. Number 2 expresses the logic cleanly. What others have pointed out is that yes it can become cumbersome. This tendency though has an important benefit. Style #1 can hide that your function is probably doing too much. It doesn't visually demonstrate the complexity of what's going on very well. I.e. it prevents the code from saying to you "hey this is getting a bit too complex for this one function". It also makes it a bit easier for other developers that don't know your code to miss those returns sprinkled here and there, at first glance anyway.
So let the code speak. When you see long conditions appearing or nested if statements it is saying that maybe it would be better to break this stuff up into multiple functions or that it needs to be rewritten more elegantly.

Successive success checks

Most of you have probably bumped into a situation, where multiple things must be in check and in certain order before the application can proceed, for example in a very simple case of creating a listening socket (socket, bind, listen, accept etc.). There are at least two obvious ways (don't take this 100% verbatim):
if (1st_ok)
{
if (2nd_ok)
{
...
or
if (!1st_ok)
{
return;
}
if (!2nd_ok)
{
return;
}
...
Have you ever though of anything smarter, do you prefer one over the other of the above, or do you (if the language provides for it) use exceptions?
I prefer the second technique. The main problem with the first one is that it increases the nesting depth of the code, which is a significant issue when you've got a substantial number of preconditions/resource-allocs to check since the business part of the function ends up deeply buried behind a wall of conditions (and frequently loops too). In the second case, you can simplify the conceptual logic to "we've got here and everything's OK", which is much easier to work with. Keeping the normal case as straight-line as possible is just easier to grok, especially when doing maintenance coding.
It depends on the language - e.g. in C++ you might well use exceptions, while in C you might use one of several strategies:
if/else blocks
goto (one of the few cases where a single goto label for "exception" handling might be justified
use break within a do { ... } while (0) loop
Personally I don't like multiple return statements in a function - I prefer to have a common clean up block at the end of the function followed by a single return statement.
This tends to be a matter of style. Some people only like returning at the end of a procedure, others prefer to do it wherever needed.
I'm a fan of the second method, as it allows for clean and concise code as well as ease of adding documentation on what it's doing.
// Checking for llama integration
if (!1st_ok)
{
return;
}
// Llama found, loading spitting capacity
if (!2nd_ok)
{
return;
}
// Etc.
I prefer the second version.
In the normal case, all code between the checks executes sequentially, so I like to see them at the same level. Normally none of the if branches are executed, so I want them to be as unobtrusive as possible.
I use 2nd because I think It reads better and easier to follow the logic. Also they say exceptions should not be used for flow control, but for the exceptional and unexpected cases. Id like to see what pros say about this.
What about
if (1st_ok && 2nd_ok) { }
or if some work must be done, like in your example with sockets
if (1st_ok() && 2nd_ok()) { }
I avoid the first solution because of nesting.
I avoid the second solution because of corporate coding rules which forbid multiple return in a function body.
Of course coding rules also forbid goto.
My workaround is to use a local variable:
bool isFailed = false; // or whatever is available for bool/true/false
if (!check1) {
log_error();
try_recovery_action();
isFailed = true;
}
if (!isfailed) {
if (!check2) {
log_error();
try_recovery_action();
isFailed = true;
}
}
...
This is not as beautiful as I would like but it is the best I've found to conform to my constraints and to write a readable code.
For what it is worth, here are some of my thoughts and experiences on this question.
Personally, I tend to prefer the second case you outlined. I find it easier to follow (and debug) the code. That is, as the code progresses, it becomes "more correct". In my own experience, this has seemed to be the preferred method.
I don't know how common it is in the field, but I've also seen condition testing written as ...
error = foo1 ();
if ((error == OK) && test1)) {
error = foo2 ();
}
if ((error == OK) && (test2)) {
error = foo3 ();
}
...
return (error);
Although readable (always a plus in my books) and avoiding deep nesting, it always struck me as using a lot of unnecessary testing to achieve those ends.
The first method, I see used less frequently than the second. Of those times, the vast majority of the time was because there was no nice way around it. For the remaining few instances, it was justified on the basis of extracting a little more performance on the success case. The argument was that the processor would predict a forward branch as not taken (corresponding to the else clause). This depended upon several factors including, the architecture, compiler, language, need, .... Obviously most projects (and most aspects of the project) did not meet those requirements.
Hope this helps.

Why all the functions from object oriented language allows to return only one value (General)

I am curious to know about this.
whenever I write a function which have to return multiple values, either I have to use pass by reference or create an array store values in it and pass them.
Why all the Object Orinented languages functions are not allowed to return multiple parameters as we pass them as input. Like is there anything inbuilt structure of the language which is restricting from doing this.
Dont you think it will be fun and easy if we are allowed to do so.
It's not true that all Object-Oriented languages follow this paradigm.
e.g. in Python (from here):
def quadcube (x):
return x**2, x**3
a, b = quadcube(3)
a will be 9 and b will be 27.
The difference between the traditional
OutTypeA SomeFunction(out OutTypeB, TypeC someOtherInputParam)
and your
{ OutTypeA, OutTypeB } SomeFunction(TypeC someOtherInputParam)
is just syntactic sugar. Also, the tradition of returning one single parameter type allows writing in the easy readable natural language of result = SomeFunction(...). It's just convenience and ease of use.
And yes, as others said, you have tuples in some languages.
This is likely because of the way processors have been designed and hence carried over to modern languages such as Java or C#. The processor can load multiple things (pointers) into parameter registers but only has one return value register that holds a pointer.
I do agree that not all OOP languages only support returning one value, but for the ones that "apparently" do, this I think is the reason why.
Also for returning a tuple, pair or struct for that matter in C/C++, essentially, the compiler is returning a pointer to that object.
First answer: They don't. many OOP languages allow you to return a tuple. This is true for instance in python, in C++ you have pair<> and in C++0x a fully fledged tuple<> is in TR1.
Second answer: Because that's the way it should be. A method should be short and do only one thing and thus can be argued, only need to return one thing.
In PHP, it is like that because the only way you can receive a value is by assigning the function to a variable (or putting it in place of a variable). Although I know array_map allows you to do return something & something;
To return multiple parameters, you return an single object that contains both of those parameters.
public MyResult GetResult(x)
{
return new MyResult { Squared = Math.Pow(x,2), Cubed = Math.Pow(x,3) };
}
For some languages you can create anonymous types on the fly. For others you have to specify a return object as a concrete class. One observation with OO is you do end up with a lot of little classes.
The syntactic niceties of python (see #Cowan's answer) are up to the language designer. The compiler / runtime could creating an anonymous class to hold the result for you, even in a strongly typed environment like the .net CLR.
Yes it can be easier to read in some circumstances, and yes it would be nice. However, if you read Eric Lippert's blog, you'll often read dialogue's and hear him go on about how there are many nice features that could be implemented, but there's a lot of effort that goes into every feature, and some things just don't make the cut because in the end they can't be justified.
It's not a restriction, it is just the architecture of the Object Oriented and Structured programming paradigms. I don't know if it would be more fun if functions returned more than one value, but it would be sure more messy and complicated. I think the designers of the above programming paradigms thought about it, and they probably had good reasons not to implement that "feature" -it is unnecessary, since you can already return multiple values by packing them in some kind of collection. Programming languages are designed to be compact, so usually unnecessary features are not implemented.

"Necessary" Uses of Recursion in Imperative Languages

I've recently seen in a couple of different places comments along the lines of, "I learned about recursion in school, but have never used it or felt the need for it since then." (Recursion seems to be a popular example of "book learning" amongst a certain group of programmers.)
Well, it's true that in imperative languages such as Java and Ruby[1], we generally use iteration and avoid recursion, in part because of the risk of stack overflows, and in part because it's the style most programmers in those languages are used to.
Now I know that, strictly speaking, there are no "necessary" uses of recursion in such languages: one can always somehow replace recursion with iteration, no matter how complex things get. By "necessary" here, I'm talking about the following:
Can you think of any particular examples of code in such languages where recursion was so much better than iteration (for reasons of clarity, efficiency, or otherwise) that you used recursion anyway, and converting to iteration would have been a big loss?
Recursively walking trees has been mentioned several times in the answers: what was it exactly about your particular use of it that made recursion better than using a library-defined iterator, had it been available?
[1]: Yes, I know that these are also object-oriented languages. That's not directly relevant to this question, however.
There are no "necessary" uses of recursion. All recursive algorithms can be converted to iterative ones. I seem to recall a stack being necessary, but I can't recall the exact construction off the top of my head.
Practically speaking, if you're not using recursion for the following (even in imperative languages) you're a little mad:
Tree traversal
Graphs
Lexing/Parsing
Sorting
When you are walking any kind of tree structure, for example
parsing a grammar using a recursive-descent parser
walking a DOM tree (e.g. parsed HTML or XML)
also, every toString() method that calls the toString() of the object members can be considered recursive, too. All object serializing algorithms are recursive.
In my work recursion is very rarely used for anything algorithmic. Things like factorials etc are solved much more readably (and efficiently) using simple loops. When it does show up it is usually because you are processing some data that is recursive in nature. For example, the nodes on a tree structure could be processed recursively.
If you were to write a program to walk the nodes of a binary tree for example, you could write a function that processed one node, and called itself to process each of it's children. This would be more effective than trying to maintain all the different states for each child node as you looped through them.
The most well-known example is probably the quicksort algorithm developed by by C.A.R. Hoare.
Another example is traversing a directory tree for finding a file.
In my opinion, recursive algorithms are a natural fit when the data structure is also recursive.
def traverse(node, function):
function(this)
for each childnode in children:
traverse(childnode, function)
I can't see why I'd want to write that iteratively.
It's all about the data you are processing.
I wrote a simple parser to convert a string into a data structure, it's probably the only example in 5 years' work in Java, but I think it was the right way to do it.
The string looked like this:
"{ index = 1, ID = ['A', 'B', 'C'], data = {" +
"count = 112, flags = FLAG_1 | FLAG_2 }}"
The best abstraction for this was a tree, where all leaf nodes are primitive data types, and branches could be arrays or objects. This is the typical recursive problem, a non-recursive solution is possible but much more complex.
Recursion can always be rewritten as iteration with an external stack. However if you're sure that you don't risk very deep recursion that would lead to stackoverflow, recursion is a very convenient thing.
One good example is traversing a directory structure on a known operating system. You usually know how deep it can be (maximum path length is limited) and therefore will not have a stackoverflow. Doing the same via iteration with an external stack is not so convenient.
It was said "anything tree". I may be too cautious, and I know that stacks are big nowadays, but I still won't use recursion on a typical tree. I would, however, do it on a balanced tree.
I have a List of reports. I am using indexers on my class that contains this list. The reports are retrieved by their screen names using the indexers. In the indexer, if the report for that screen name doesn't exist it loads the report and recursively calls itself.
public class ReportDictionary
{
private static List<Report> _reportList = null;
public ReportColumnList this[string screenName]
{
get
{
Report rc = _reportList.Find(delegate(Report obj) { return obj.ReportName == screenName; });
if (rc == null)
{
this.Load(screenName);
return this[screenName]; // Recursive call
}
else
return rc.ReportColumnList.Copy();
}
private set
{
this.Add(screenName, value);
}
}
}
This can be done without recursion using some additional lines of code.

What is the best practice/coding standard with regard to the "this" scope is AS3?

What is the best practice/coding standard with regard to the "this" scope is AS3? Is there one? I feel it really helps with standardization and my readability, but sometimes it seems like "too much".
For instance, is the use of "this" in the following really necessary (I know it works without "this")?:
private var _item:Object;
private var selectedItem:Object;
public function set item(value:Object):void
{
this._item = value;
if (this._item["label"] == "doodad")
this.selectedItem = value;
}
public function set item(value:Object):void
{
return this._item;
}
"this" is not required unless you want to prevent naming conflicts between locally scoped variables (method params for instance) and instance variables.
In your example you are already using an underscore to mark a private variable, so it's an extra reason not to use "this" since you are really saying twice the same thing.
It certainly isn't necessary, but I agree that it can help with readability. Since I work more in more dynamic languages (e.g. Perl and Python), such conventions can be vital for quickly determining where variables and functions are scoped/located. If this convention works for you, I don't think it's a bad thing, per se.
Thus said, I've spent hours reformatting code which contained awkward conventions which impeded readability.
For example: one person I worked with wrote all assignments like this:
var foo:String= "bar";
This was irritating (I prefer " = " so I can clearly see the operator), and I spent a lot of time cleaning up thousands of lines of code I had to maintain. His convention (which, though we argued about several times, he refused to compromise on) tended to impede my work.
Strive for unity w/others working with you. If they need to support your code and find this aggravating, it's likely not worth it to leave it in. If you don't expect anyone to work directly with the source, use conventions which help you understand your code and document (somewhere) what they mean.
If you're working in a team, stick with the coding conventions of the team.
But personally, I find explicit use of "this", when not required for disambiguation, overkill that negatively affects readability in a statically typed language like AS3 (dynamic languages are another story!).
A class should only really have one responsibility so generally there shouldn't be too many properties on it. Inside a method you generally deal with three types of variables: temporary local variables, method parameters, and properties. Methods shouldn't be too long, so it should be easy to spot the difference between the three types - if it's not defined locally and hasn't been passed as a parameter, then it's a property. If the whole method doesn't fit on your screen then it's probably too long!
I only use "this" when needed to disambiguate between a property and a parameter with the same name.
I prefer not to use "this" too much, but sometimes do in Eclipse, just to get autocompletion (probably the worst reason to do it!)
Would make more sense if your example was:
public function set item(_item:Object):void
{
this._item = _item;
if (this._item["label"] == "doodad")
this.selectedItem = this._item;
}