A tool to detect unnecessary recursive calls in a program? - language-agnostic

A very common beginner mistake when writing recursive functions is to accidentally fire off completely redundant recursive calls from the same function. For example, consider this recursive function that finds the maximum value in a binary tree (not a binary search tree):
int BinaryTreeMax(Tree* root) {
if (root == null) return INT_MIN;
int maxValue = root->value;
if (maxValue < BinaryTreeMax(root->left))
maxValue = BinaryTreeMax(root->left); // (1)
if (maxValue < BinaryTreeMax(root->right))
maxValue = BinaryTreeMax(root->right); // (2)
return maxValue;
}
Notice that this program potentially makes two completely redundant recursive calls to BinaryTreeMax in lines (1) and (2). We could rewrite this code so that there's no need for these extra calls by simply caching the value from before:
int BinaryTreeMax(Tree* root) {
if (root == null) return INT_MIN;
int maxValue = root->value;
int leftValue = BinaryTreeMax(root->left);
int rightValue = BinaryTreeMax(root->right);
if (maxValue < leftValue)
maxValue = leftValue;
if (maxValue < rightValue)
maxValue = rightValue;
return maxValue;
}
Now, we always make exactly two recursive calls.
My question is whether there is a tool that does either a static or dynamic analysis of a program (in whatever language you'd like; I'm not too picky!) that can detect whether a program is making completely unnecessary recursive calls. By "completely unnecessary" I mean that
The recursive call has been made before,
by the same invocation of the recursive function (or one of its descendants), and
the call itself has no observable side-effects.
This is something that can usually be determined by hand, but I think it would be great if there were some tool that could flag things like this automatically as a way of helping students gain feedback about how to avoid making simple but expensive mistakes in their programs that could contribute to huge inefficiencies.
Does anyone know of such a tool?

First, your definition of 'completely unnecessary' is insufficient. It is possible that some code between the two function calls affects the result of the second function call.
Second, this has nothing to do with recursion, the same question can apply to any function call. If it has been called before with the exact same parameters, has no side-effects, and no code between the two calls changed any data the function accesses.
Now, I'm pretty sure a perfect solution is impossible, as it would solve The Halting Problem, but that doesn't mean there isn't a way to detect enough of these cases and optimize away some of them.
Some compilers know how to do that (GCC has a specific flag that warns you when it does so). Here's a 2003 article I found about the issue: http://www.cs.cmu.edu/~jsstylos/15745/final.pdf .
I couldn't find a tool for this, though, but that's probably something Eric Lipert knows, if he happens to bump into your question.

Some compilers (such as GCC) do have ways to mark determinate functions explicitly (to be more precise, __attribute__((const)) (see GCC function attributes) applies some restrictions onto the function body to make its result depend only from its argument and get no depency from shared state of program or other non-deterministic functions). Then they eliminate duplicate calls to costy functions. Some other high-level language implementations (may be Haskell) does this tests automatically.
Really, I don't know tools for such analysis (but if i find it i will be happy). And if there is one that correcly detects unnecessary recursion or, in general way, function evaluation (in language-agnostic environment) it would be a kind of determinacy prover.
BTW, it's not so difficult to write such program when you already have access to semantic tree of the code :)

Related

Why does 'return' end a function

I'm just curious about why return ends the function.
Why do we not write
function Foo (){
BAR = calculate();
give back BAR;
//do sth later
log(BAR);
end;
}
Why do we need to do this?
function Foo (){
BAR = calculate();
log(BAR);
return BAR;
}
Is this to prevent multiple usage of a give back/return value in a function?
The idea of a function stems from mathematics, e.g. x = f(y). Once you have computed f(y) for a specific value of y, you can simply substitute that value in that equation for the same result, e.g. x = 42. So the notion of a function having one result or one return value is quite strong. Further, such mathematical functions are pure, meaning they have no side effects. In the above formula it doesn’t make a difference whether you write f(y) or its computed result 42, the function doesn’t do anything else and hence won’t change the result. Being able to make these assumptions makes it much easier to reason about formulas and programs.
return in programming also has practical implementation implications, as most languages typically pop the stack upon returning, based on the assumption/restriction that it’s not needed any further.
Many languages do allow a function to “spit out” a value yet continue, which is usually implemented as generators and the yield keyword. However, the generator won’t typically simply continue running in the background, it needs to be explicitly invoked again to yield its next value. A transfer of control is necessary; either the generator runs, or its caller does, they can’t both run simultaneously.
If you did want to run two pieces of code simultaneously, be that a generator or a function’s “after return block”, you need to decide on a mode of multitasking like threading, or cooperative multitasking (async execution) or something else, which brings with it all the fun difficulties of managing shared resource access and the like. While it’s not unthinkable to write a language which would handle that implicitly and elegantly, elegant implicit multitasking which manages all these difficulties automagically simply does not fit into most C-like languages. Which is likely one of many reasons leading to a simple stack-popping, function-terminating return statement.
Using return gives you a lot of flexibility regarding where, when and how you return the value of a function as well as an easy to read statement of 'I am now returning this value'.
If following your idea, you could have a situation where the function got evaulated to some value and you have to figure out if that assignment got changed somewhere later in the flow.

Nesting Asynchronous Promises in ActionScript

I have a situation where I need to perform dependent asynchronous operations. For example, check the database for data, if there is data, perform a database write (insert/update), if not continue without doing anything. I have written myself a promise based database API using promise-as3. Any database operation returns a promise that is resolved with the data of a read query, or with the Result object(s) of a write query. I do the following to nest promises and create one point of resolution or rejection for the entire 'initialize' operation.
public function initializeTable():Promise
{
var dfd:Deferred = new Deferred();
select("SELECT * FROM table").then(tableDefaults).then(resolveDeferred(dfd)).otherwise(errorHandler(dfd));
return dfd.promise;
}
public function tableDefaults(data:Array):Promise
{
if(!data || !data.length)
{
//defaultParams is an Object of table default fields/values.
return insert("table", defaultParams);
} else
{
var resolved:Deferred = new Deferred();
resolved.resolve(null);
return resolved.promise;
}
}
public function resolveDeferred(deferred:Deferred):Function
{
return function resolver(value:*=null):void
{
deferred.resolve(value);
}
}
public function rejectDeferred(deferred:Deferred):Function
{
return function rejector(reason:*=null):void
{
deferred.reject(reason);
}
}
My main questions:
Are there any performance issues that will arise from this? Memory leaks etc.? I've read that function variables perform poorly, but I don't see another way to nest operations so logically.
Would it be better to have say a global resolved instance that is created and resolved only once, but returned whenever we need an 'empty' promise?
EDIT:
I'm removing question 3 (Is there a better way to do this??), as it seems to be leading to opinions on the nature of promises in asynchronous programming. I meant better in the scope of promises, not asynchronicity in general. Assume you have to use this promise based API for the sake of the question.
I usually don't write those kind of opinion based answers, but here it's pretty important. Promises in AS3 = THE ROOTS OF ALL EVIL :) And I'll explain you why..
First, as BotMaster said - it's weakly typed. What this means is that you don't use AS3 properly. And the only reason this is possible is because of backwards compatibility. The true here is, that Adobe have spent thousands of times so that they can turn AS3 into strongly type OOP language. Don't stray away from that.
The second point is that Promises, at first place, are created so that poor developers can actually start doing some job in JavaScript. This is not a new design pattern or something. Actually, it has no real benefits if you know how to structure your code properly. The thing that Promises help the most, is avoiding the so called Wall of Hell. But there are other ways to fix this in a natural manner (the very very basic thing is not to write functions within functions, but on the same level, and simply check the passed result).
The most important here is the nature of Promises. Very few people know what they actually do behind the scenes. Because of the nature of JavaScript (and ECMA script at all), there is no real way to tell if a function completed properly or not. If you return false / null / undefined - they are all regular return values. The only way they could actually say "this operation failed" is by throwing an error. So every promisified method, can potentially throw an error. And each error must be handled, or otherwise your code can stop working properly. What this means, is that every single action inside Promise is within try-catch block! Every time you do absolutely basic stuff, you wrap it in try-catch. Even this block of yours:
else
{
var resolved:Deferred = new Deferred();
resolved.resolve(null);
return resolved.promise;
}
In a "regular" way, you would simply use else { return null }. But now, you create tons of objects, resolvers, rejectors, and finally - you try-catch this block.
I cannot stress more on this, but I think you are getting the point. Try-catch is extremely slow! I understand that this is not a big problem in such a simple case like the one I just mentioned, but imagine you are doing it more and on more heavy methods. You are just doing extremely slow operations, for what? Because you can write lame code and just enjoy it..
The last thing to say - there are plenty of ways to use asynchronous operations and make them work one after another. Just by googling as3 function queue I found a few. Not to say that the event-based system is so flexible, and there are even alternatives to it (using callbacks). You've got it all in your hands, and you turn to something that is created because lacking proper ways to do it otherwise.
So my sincere advise as a person worked with Flash for a decade, doing casino games in big teams, would be - don't ever try using promises in AS3. Good luck!
var dfd:Deferred = new Deferred();
select("SELECT * FROM table").then(tableDefaults).then(resolveDeferred(dfd)).otherwise(errorHandler(dfd));
return dfd.promise;
This is the The Forgotten Promise antipattern. It can instead be written as:
return select("SELECT * FROM table").then(tableDefaults);
This removes the need for the resolveDeferred and rejectDeferred functions.
var resolved:Deferred = new Deferred();
resolved.resolve(null);
return resolved.promise;
I would either extract this to another function, or use Promise.when(null). A global instance wouldn't work because it would mean than the result handlers from one call can be called for a different one.

Are there disadvantages to return type inference? If yes, what are they?

A lot of statically typed languages, like C++ and C#, have local variable type inference (with the keywords auto and var respectively, I think).
However, I haven't seen many C-derived languages (apart from those mentioned in the comments) implementing compile-time return type inference. I'll describe what I mean by "return type inference" before I ask the question. (I definitely don't mean overloading by return type.)
Consider this code in a hypothetical C#-like language:
private auto SomeMethod(int x)
{
return 3 * x;
}
It's more than obvious (to humans and to the compiler) that the return type is int (and the compilers can verify it).
The same goes for multiple paths:
private auto SomeOtherMethod(int x)
{
if(x == 0) return 1;
else return 3 * x;
}
It's still not ambiguous at all, because there is already an algorithm in said languages to resolve whether two expressions have compatible types:
private auto YetAnotherMethod(int x)
{
var r = (x == 0) ? 1 : 3 * x;
return r;
}
Since the algorithm exists and it is already implemented in some form, it's probably not a technical problem in this regard. But still, I haven't seen it anywhere in statically typed languages, which got me thinking about whether there's something bad about it.
My question:
Does return type inference, as a concept, have any disadvantage or subtle pitfall that I'm not seeing? (Apart from readability - I already understand that.)
Is there some corner case where it would introduce problems or ambiguity to a statically typed language? (By "introduce", I'm referring to issues that local variable type inference doesn't already have.)
yes, there are disadvantages. one you already mentioned: readability. second - the type has to be calculated so it takes time (in turing-complete type systems it may be infinite). but there is also something different - theory of type systems is much more complicated.
let's write a function that takes a list and return its head. what's its type? or function that takes a function, and a parameter applies that and return the result. in many languages you can't declare it. to support this kind of stuff, java introduced generics and it failed miserably. currently it's one of the most hated features of the language because of consistency problems
another thing: returned type may depend on not only the body of the function but also context of the invocation. let's look at haskell (that has best type system i've ever seen) http://learnyouahaskell.com/types-and-typeclasses
there is a function called read that takes a string, parse it and return... whatever you need, an int, an array.
so each time a type system is designed, the designer has to choose at which level she wants to stop. dynamic languages decided not to infer types at all, scala decided to do some local inference but not, for example, for overloaded or recursive functions and c++ decided not to infer the result

What is a simple do-once technique?

What is a simple technique to perform some action just once, no matter how many times the function is executed? Do any programming languages have specific ways built-in to handle this somewhat common problem?
Example: initialize() shall set global_variable to true on ONLY its first execution.
A c++ example (looking for alternatives to this out of curiosity - not necessity):
init.h:
int global_variable;
void initialize(void);
init.c:
static bool already_initialized = false;
void initialize(void)
{
if (!already_initialized)
{
already_initialized = true;
global_variable = true;
}
}
Apart from the global variable technique that's available in any language there are several other ways to do this.
In languages that have static variables using a static variable instead of global is preferable in order to prevent variable name collisions in the global scope.
In some languages you can redefine/redeclare functions at runtime so you can do something like this:
function initialize (void) {
// do some stuff...
function initialize(void) {}; // redefine function here to do nothing
}
In some languages you can't quite redeclare functions within functions due to scope issues (inner functions) but you can still reassign other functions to an existing function. So you can do something like this:
function initialize (void) {
// do some stuff ...
initialize = function (void) {}; // assign no-op anonymous function
// to this function
}
Some languages (especially declarative languages) actually have a "latch" functionality built in that executes just once. Sometimes there is even a reset functionality. So you can actually do something like this:
function do_once initialize (void) {
// do some stuff
}
If the language allows it you can reset the do_once directive if you really want to re-execute the function:
reset initialize;
initialize();
Note: The C-like syntax above are obviously pseudocode and don't represent any real language but the features described do exist in real languages. Also, programmers rarely encounter declarative languages apart from HTML, XML and CSS but Turing complete declarative languages do exist and are typically used for hardware design and the "do_once" feature typically compiles down to a D flip-flop or latch.
Eiffel has a built-in notion of once routines. A once routine is executed only the first time it is called, on the next call it is not executed. If the routine is a function, i.e. returns a result, the result of the first execution is returned for all subsequent calls. If the first call terminates with an exception, the same exception is raised for all subsequent calls.
The declaration of the once function foo looks like
foo: RETURN_TYPE
once
... -- Some code to initialize Result.
end
In a multithreaded environment it might be desirable to distinguish objects used by different threads. This is accomplished by adding the key "THREAD" to the declaration (it is actually the default):
foo: RETURN_TYPE
once ("THREAD")
...
end
If the same object has to be shared by all the threads, the key "PROCESS" is used instead:
foo: RETURN_TYPE
once ("PROCESS")
...
end
The same syntax though without return type is used for procedures.
Process-wide once routines are guaranteed to be executed just once for the whole process. Because race conditions are possible, Eiffel run-time makes sure at most one thread may trigger evaluation of the given once routine at a time. Other threads become suspended until the primary execution completes so that they can use the single result or be sure the action is performed only once.
In other respects once routines are no different from the regular routines in a sense that they follow the same rules of object-oriented programming like inheritance and redeclaration (overriding). Because this is a normal routine it can call other routines, directly or indirectly involving itself. When such a recursive call occurs, the once routine is not executed again, but the last known value of the result is returned instead.
yes, some languages (scala) does support it (using lazy) but usually this functionality is provided by frameworks because there are some trade offs. sometimes you need thread level, blocking synchronization. sometimes spin-offs is enough. sometimes you don't need synchronization because simple single-threaded cache is enough. sometimes you need to remember many calculated values and you are willing to forget last recently used ones. and so on. probably that's why languages generally don't support that pattern - that's frameworks' job

Are hard-coded STRINGS ever acceptable?

Similar to Is hard-coding literals ever acceptable?, but I'm specifically thinking of "magic strings" here.
On a large project, we have a table of configuration options like these:
Name Value
---- -----
FOO_ENABLED Y
BAR_ENABLED N
...
(Hundreds of them).
The common practice is to call a generic function to test an option like this:
if (config_options.value('FOO_ENABLED') == 'Y') ...
(Of course, this same option may need to be checked in many places in the system code.)
When adding a new option, I was considering adding a function to hide the "magic string" like this:
if (config_options.foo_enabled()) ...
However, colleagues thought I'd gone overboard and objected to doing this, preferring the hard-coding because:
That's what we normally do
It makes it easier to see what's going on when debugging the code
The trouble is, I can see their point! Realistically, we are never going to rename the options for any reason, so about the only advantage I can think of for my function is that the compiler would catch any typo like fo_enabled(), but not 'FO_ENABLED'.
What do you think? Have I missed any other advantages/disadvantages?
If I use a string once in the code, I don't generally worry about making it a constant somewhere.
If I use a string twice in the code, I'll consider making it a constant.
If I use a string three times in the code, I'll almost certainly make it a constant.
if (config_options.isTrue('FOO_ENABLED')) {...
}
Restrict your hard coded Y check to one place, even if it means writing a wrapper class for your Map.
if (config_options.isFooEnabled()) {...
}
Might seem okay until you have 100 configuration options and 100 methods (so here you can make a judgement about future application growth and needs before deciding on your implementation). Otherwise it is better to have a class of static strings for parameter names.
if (config_options.isTrue(ConfigKeys.FOO_ENABLED)) {...
}
I realise the question is old, but it came up on my margin.
AFAIC, the issue here has not been identified accurately, either in the question, or the answers. Forget about 'harcoding strings" or not, for a moment.
The database has a Reference table, containing config_options. The PK is a string.
There are two types of PKs:
Meaningful Identifiers, that the users (and developers) see and use. These PKs are supposed to be stable, they can be relied upon.
Meaningless Id columns which the users should never see, that the developers have to be aware of, and code around. These cannot be relied upon.
It is ordinary, normal, to write code using the absolute value of a meaningful PK IF CustomerCode = "IBM" ... or IF CountryCode = "AUS" etc.
referencing the absolute value of a meaningless PK is not acceptable (due to auto-increment; gaps being changed; values being replaced wholesale).
.
Your reference table uses meaningful PKs. Referencing those literal strings in code is unavoidable. Hiding the value will make maintenance more difficult; the code is no longer literal; your colleagues are right. Plus there is the additional redundant function that chews cycles. If there is a typo in the literal, you will soon find that out during Dev testing, long before UAT.
hundreds of functions for hundreds of literals is absurd. If you do implement a function, then Normalise your code, and provide a single function that can be used for any of the hundreds of literals. In which case, we are back to a naked literal, and the function can be dispensed with.
the point is, the attempt to hide the literal has no value.
.
It cannot be construed as "hardcoding", that is something quite different. I think that is where your issue is, identifying these constructs as "hardcoded". It is just referencing a Meaningfull PK literally.
Now from the perspective of any code segment only, if you use the same value a few times, you can improve the code by capturing the literal string in a variable, and then using the variable in the rest of the code block. Certainly not a function. But that is an efficiency and good practice issue. Even that does not change the effect IF CountryCode = #cc_aus
I really should use constants and no hard coded literals.
You can say they won't be changed, but you may never know. And it is best to make it a habit. To use symbolic constants.
In my experience, this kind of issue is masking a deeper problem: failure to do actual OOP and to follow the DRY principle.
In a nutshell, capture the decision at startup time by an appropriate definition for each action inside the if statements, and then throw away both the config_options and the run-time tests.
Details below.
The sample usage was:
if (config_options.value('FOO_ENABLED') == 'Y') ...
which raises the obvious question, "What's going on in the ellipsis?", especially given the following statement:
(Of course, this same option may need to be checked in many places in the system code.)
Let's assume that each of these config_option values really does correspond to a single problem domain (or implementation strategy) concept.
Instead of doing this (repeatedly, in various places throughout the code):
Take a string (tag),
Find its corresponding other string (value),
Test that value as a boolean-equivalent,
Based on that test, decide whether to perform some action.
I suggest encapsulating the concept of a "configurable action".
Let's take as an example (obviously just as hypthetical as FOO_ENABLED ... ;-) that your code has to work in either English units or metric units. If METRIC_ENABLED is "true", convert user-entered data from metric to English for internal computation, and convert back prior to displaying results.
Define an interface:
public interface MetricConverter {
double toInches(double length);
double toCentimeters(double length);
double toPounds(double weight);
double toKilograms(double weight);
}
which identifies in one place all the behavior associated with the concept of METRIC_ENABLED.
Then write concrete implementations of all the ways those behaviors are to be carried out:
public class NullConv implements MetricConverter {
double toInches(double length) {return length;}
double toCentimeters(double length) {return length;}
double toPounds(double weight) {return weight;}
double toKilograms(double weight) {return weight;}
}
and
// lame implementation, just for illustration!!!!
public class MetricConv implements MetricConverter {
public static final double LBS_PER_KG = 2.2D;
public static final double CM_PER_IN = 2.54D
double toInches(double length) {return length * CM_PER_IN;}
double toCentimeters(double length) {return length / CM_PER_IN;}
double toPounds(double weight) {return weight * LBS_PER_KG;}
double toKilograms(double weight) {return weight / LBS_PER_KG;}
}
At startup time, instead of loading a bunch of config_options values, initialize a set of configurable actions, as in:
MetricConverter converter = (metricOption()) ? new MetricConv() : new NullConv();
(where the expression metricOption() above is a stand-in for whatever one-time-only check you need to make, including looking at the value of METRIC_ENABLED ;-)
Then, wherever the code would have said:
double length = getLengthFromGui();
if (config_options.value('METRIC_ENABLED') == 'Y') {
length = length / 2.54D;
}
// do some computation to produce result
// ...
if (config_options.value('METRIC_ENABLED') == 'Y') {
result = result * 2.54D;
}
displayResultingLengthOnGui(result);
rewrite it as:
double length = converter.toInches(getLengthFromGui());
// do some computation to produce result
// ...
displayResultingLengthOnGui(converter.toCentimeters(result));
Because all of the implementation details related to that one concept are now packaged cleanly, all future maintenance related to METRIC_ENABLED can be done in one place. In addition, the run-time trade-off is a win; the "overhead" of invoking a method is trivial compared with the overhead of fetching a String value from a Map and performing String#equals.
I believe that the two reasons you have mentioned, Possible misspelling in string, that cannot be detected until run time and the possibility (although slim) of a name change would justify your idea.
On top of that you can get typed functions, now it seems you only store booleans, what if you need to store an int, a string etc. I would rather use get_foo() with a type, than get_string("FOO") or get_int("FOO").
I think there are two different issues here:
In the current project, the convention of using hard-coded strings is already well established, so all the developers working on the project are familiar with it. It might be a sub-optimal convention for all the reasons that have been listed, but everybody familiar with the code can look at it and instinctively knows what the code is supposed to do. Changing the code so that in certain parts, it uses the "new" functionality will make the code slightly harder to read (because people will have to think and remember what the new convention does) and thus a little harder to maintain. But I would guess that changing over the whole project to the new convention would potentially be prohibitively expensive unless you can quickly script the conversion.
On a new project, symbolic constants are the way IMO, for all the reasons listed. Especially because anything that makes the compiler catch errors at compile time that would otherwise be caught by a human at run time is a very useful convention to establish.
Another thing to consider is intent. If you are on a project that requires localization hard coded strings can be ambiguous. Consider the following:
const string HELLO_WORLD = "Hello world!";
print(HELLO_WORLD);
The programmer's intent is clear. Using a constant implies that this string does not need to be localized. Now look at this example:
print("Hello world!");
Here we aren't so sure. Did the programmer really not want this string to be localized or did the programmer forget about localization while he was writing this code?
I too prefer a strongly-typed configuration class if it is used through-out the code. With properly named methods you don't lose any readability. If you need to do conversions from strings to another data type (decimal/float/int), you don't need to repeat the code that does the conversion in multiple places and can cache the result so the conversion only takes place once. You've already got the basis of this in place already so I don't think it would take much to get used to the new way of doing things.