I'm just curious about why return ends the function.
Why do we not write
function Foo (){
BAR = calculate();
give back BAR;
//do sth later
log(BAR);
end;
}
Why do we need to do this?
function Foo (){
BAR = calculate();
log(BAR);
return BAR;
}
Is this to prevent multiple usage of a give back/return value in a function?
The idea of a function stems from mathematics, e.g. x = f(y). Once you have computed f(y) for a specific value of y, you can simply substitute that value in that equation for the same result, e.g. x = 42. So the notion of a function having one result or one return value is quite strong. Further, such mathematical functions are pure, meaning they have no side effects. In the above formula it doesn’t make a difference whether you write f(y) or its computed result 42, the function doesn’t do anything else and hence won’t change the result. Being able to make these assumptions makes it much easier to reason about formulas and programs.
return in programming also has practical implementation implications, as most languages typically pop the stack upon returning, based on the assumption/restriction that it’s not needed any further.
Many languages do allow a function to “spit out” a value yet continue, which is usually implemented as generators and the yield keyword. However, the generator won’t typically simply continue running in the background, it needs to be explicitly invoked again to yield its next value. A transfer of control is necessary; either the generator runs, or its caller does, they can’t both run simultaneously.
If you did want to run two pieces of code simultaneously, be that a generator or a function’s “after return block”, you need to decide on a mode of multitasking like threading, or cooperative multitasking (async execution) or something else, which brings with it all the fun difficulties of managing shared resource access and the like. While it’s not unthinkable to write a language which would handle that implicitly and elegantly, elegant implicit multitasking which manages all these difficulties automagically simply does not fit into most C-like languages. Which is likely one of many reasons leading to a simple stack-popping, function-terminating return statement.
Using return gives you a lot of flexibility regarding where, when and how you return the value of a function as well as an easy to read statement of 'I am now returning this value'.
If following your idea, you could have a situation where the function got evaulated to some value and you have to figure out if that assignment got changed somewhere later in the flow.
Related
Let X be a square matrix. We want to force it to be Hermitian, that is: self-conjugate-transpose. X = X^H = conj(X^T). To do this in Python with numpy is easy:
X = 0.5*(X + np.conj(X.T))
I haven't found in NumPy a single function that does it in a single experssion f(x).
The question is should I define a new function to do it? E.g.
def make_hermitian(X):
return 0.5*(X + np.conj(X.T))
(one can come up with short name, e.g. "make_h" or "herm" or "selfconj").
Pros: more readable code, one operation in shorter form. If one uses shorter name it saves writing when repeated many times, and makes modification in this operation far more easy and comfortable (need to change only in place).
Cons: replaces a very short and straight-forward expression which is self-evident.
What is more appropriate way of programming: define a new function or just write the explicit expression repeatedly?
I would say it depends on how many times you need to reuse that function.
If it's more than twice, then definitely make a function. If it's only once or twice, I would say it's up to you. If you choose to go with no function, add a short comment specifying what such piece of code is supposed to do.
My preference in any case would be defining a function with a meaningful name, because if anyone else is going to / supposed to read the code, they may not know or remember how to achieve a Hermitian matrix, and hence the math alone ain't going to be sufficient.
On the other hand, a meaningful function name will tell them clearly what it's going on, and they can google after what a Hermitian matrix is.
I've been reading a Concepts of Programming Languages by Robert W. Sebesta and in chapter 9 there is a brief section on passing a SubProgram to a function as a parameter. The section on this is extremely brief, about 1.5 pages, and the only explanation to its application is:
When a subprogram must sample some mathematical function. Such as a Subprogram that does numerical integration by estimating the area under a graph of a function by sampling the function at a number of different points. Such a Subprogram should be usable everywhere.
This is completely off from anything I have ever learned. If I were to approach this problem in my own way I would create a function object and create a function that accomplishes the above and accepts function objects.
I have no clue why this is a design issue for languages because I have no idea where I would ever use this. A quick search hasn't made this any clearer for me.
Apparently you can accomplish this in C and C++ by utilizing pointers. Languages that allow nested Subprograms such as JavaScript allow you do do this in 3 separate ways:
function sub1() {
var x;
function sub2() {
alert( x ); //Creates a dialog box with the value of x
};
function sub3() {
var x;
x = 3;
sub4( sub2 ); //*shallow binding* the environment of the
//call statement that enacts the passed
//subprogram
};
function sub4( subx ) {
var x;
x = 4;
subx();
};
x=1;
sub3();
};
I'd appreciate any insight offered.
Being able to pass "methods" is very useful for a variety of reasons. Among them:
Code which is performing a complicated operation might wish to provide a means of either notifying a user of its progress or allowing the user to cancel it. Having the code for the complicated operation has to do those actions itself will both add complexity to it and also cause ugliness if it's invoked from code which uses a different style of progress bar or "Cancel" button. By contrast, having the caller supply an UpdateStatusAndCheckCancel() method means that the caller can supply a method which will update whatever style of progress bar and cancellation method the caller wants to use.
Being able to store methods within a table can greatly simplify code that needs to export objects to a file and later import them again. Rather than needing to have code say
if (ObjectType == "Square")
AddObject(new Square(ObjectParams));
else if (ObjectType == "Circle")
AddObject(new Circle(ObjectParams));`
etc. for every kind of object
code can say something like
if (ObjectCreators.TryGetValue(ObjectType, out factory))
AddObject(factory(ObjectParams));
to handle all kinds of object whose creation methods have been added to ObjectCreators.
Sometimes it's desirable to be able to handle events that may occur at some unknown time in the future; the author of code which knows when those events occur might have no clue about what things are supposed to happen then. Allowing the person who wants the action to happen to give a method to the code which will know when it happens allows for that code to perform the action at the right time without having to know what it should do.
The first situation represents a special case of callback where the function which is given the method is expected to only use it before it returns. The second situation is an example of what's sometimes referred to as a "factory pattern" or "dependency injection" [though those terms are useful in some broader contexts as well]. The third case is commonly handled using constructs which frameworks refer to as events, or else with an "observer" pattern [the observer asks the observable object to notify it when something happens].
A very common beginner mistake when writing recursive functions is to accidentally fire off completely redundant recursive calls from the same function. For example, consider this recursive function that finds the maximum value in a binary tree (not a binary search tree):
int BinaryTreeMax(Tree* root) {
if (root == null) return INT_MIN;
int maxValue = root->value;
if (maxValue < BinaryTreeMax(root->left))
maxValue = BinaryTreeMax(root->left); // (1)
if (maxValue < BinaryTreeMax(root->right))
maxValue = BinaryTreeMax(root->right); // (2)
return maxValue;
}
Notice that this program potentially makes two completely redundant recursive calls to BinaryTreeMax in lines (1) and (2). We could rewrite this code so that there's no need for these extra calls by simply caching the value from before:
int BinaryTreeMax(Tree* root) {
if (root == null) return INT_MIN;
int maxValue = root->value;
int leftValue = BinaryTreeMax(root->left);
int rightValue = BinaryTreeMax(root->right);
if (maxValue < leftValue)
maxValue = leftValue;
if (maxValue < rightValue)
maxValue = rightValue;
return maxValue;
}
Now, we always make exactly two recursive calls.
My question is whether there is a tool that does either a static or dynamic analysis of a program (in whatever language you'd like; I'm not too picky!) that can detect whether a program is making completely unnecessary recursive calls. By "completely unnecessary" I mean that
The recursive call has been made before,
by the same invocation of the recursive function (or one of its descendants), and
the call itself has no observable side-effects.
This is something that can usually be determined by hand, but I think it would be great if there were some tool that could flag things like this automatically as a way of helping students gain feedback about how to avoid making simple but expensive mistakes in their programs that could contribute to huge inefficiencies.
Does anyone know of such a tool?
First, your definition of 'completely unnecessary' is insufficient. It is possible that some code between the two function calls affects the result of the second function call.
Second, this has nothing to do with recursion, the same question can apply to any function call. If it has been called before with the exact same parameters, has no side-effects, and no code between the two calls changed any data the function accesses.
Now, I'm pretty sure a perfect solution is impossible, as it would solve The Halting Problem, but that doesn't mean there isn't a way to detect enough of these cases and optimize away some of them.
Some compilers know how to do that (GCC has a specific flag that warns you when it does so). Here's a 2003 article I found about the issue: http://www.cs.cmu.edu/~jsstylos/15745/final.pdf .
I couldn't find a tool for this, though, but that's probably something Eric Lipert knows, if he happens to bump into your question.
Some compilers (such as GCC) do have ways to mark determinate functions explicitly (to be more precise, __attribute__((const)) (see GCC function attributes) applies some restrictions onto the function body to make its result depend only from its argument and get no depency from shared state of program or other non-deterministic functions). Then they eliminate duplicate calls to costy functions. Some other high-level language implementations (may be Haskell) does this tests automatically.
Really, I don't know tools for such analysis (but if i find it i will be happy). And if there is one that correcly detects unnecessary recursion or, in general way, function evaluation (in language-agnostic environment) it would be a kind of determinacy prover.
BTW, it's not so difficult to write such program when you already have access to semantic tree of the code :)
Occasionally I'll have a situation where I've written some code and, based on its logic, a certain path is impossible. For example:
activeGames = [10, 20, 30]
limit = 4
def getBestActiveGameStat():
if not activeGames: return None
return max(activeGames)
def bah():
if limit == 0: return "Limit is 0"
if len(activeGames) >= limit:
somestat = getBestActiveGameStat()
if somestat is None:
print "The universe has exploded"
#etc...
What would go in the universe exploding line? If limit is 0, then the function returns. If len(activeGames) >= limit, then there must be at least one active game, so getBestActiveGameStat() can't return None. So, should I even check for it?
The same also happens with something like a while loop which always returns in the loop:
def hmph():
while condition:
if foo: return "yep"
doStuffToMakeFooTrue()
raise SingularityFlippedMyBitsError()
Since I "know" it's impossible, should anything even be there?
If len(activeGames) >= limit, then
there must be at least one active
game, so getBestActiveGameStat() can't
return None. So, should I even check
for it?
Sometimes we make mistakes. You could have a program error now -- or someone could create one later.
Those errors might result in exceptions or failed unit tests. But debugging is expensive; it's useful to have multiple ways to detect errors.
A quickly written assert statement can express an expected invariant to human readers. And when debugging, a failed assertion can pinpoint an error quickly.
Sutter and Alexandrescu address this issue in "C++ Coding Standards." Despite the title, their arguments and guidelines are are language agnostic.
Assert liberally to document internal assumptions and invariants
... Use assert or an equivalent liberally to document assumptions internal to a module ... that must always be true and otherwise represent programming errors.
For example, if the default case in a switch statement cannot occur, add the case with assert(false).
IMHO, the first example is really more a question of how catastrophic failures are presented to the user. In the event that someone does something really silly and sets activeGames to none, most languages will throw a NullPointer/InvalidReference type of exception. If you have a good system for catching these kinds of errors and handling them elegantly, then I would argue that you leave these guards out entirely.
If you have a decent set of unit tests, they will ensure with huge amounts of certainty that this kind of problem does not escape the developers machine.
As for the second one, what you're really guarding against is a race condition. What if the "doStuffToMakeFooTrue()" method never makes foo true? This code will eventually run itself into the ground. Rather than risk that, I'll usually put code like this on a timer. If your language has closures or function pointers (honestly not sure about Python...), you can hide the implementation of the timing logic in a nice helper method, and call it this way:
withTiming(hmph, 30) // run for 30 seconds, then fail
If you don't have closures or function pointers, you'll have to do it the long way everywhere:
stopwatch = new Stopwatch(30)
stopwatch.start()
while stopwatch.elapsedTimeInSeconds() < 30
hmph()
raise OperationTimedOutError()
I'm writing a toy compiler thingy which can optimise function calls if the result depends only on the values of the arguments. So functions like xor and concatenate depend only on their inputs, calling them with the same input always gives the same output. But functions like time and rand depend on "hidden" program state, and calling them with the same input may give different output. I'm just trying to figure out what the adjective is that distinguishes these two types of function, like "isomorphic" or "re-entrant" or something. Can someone tell me the word I'm looking for?
The term you are looking for is Pure
http://en.wikipedia.org/wiki/Pure_function
I think it's called Pure Function:
In computer programming, a function may be described as pure if both these statements about the function hold:
The function always evaluates the same result value given the same argument value(s). The function result value cannot depend on any hidden information or state that may change as program execution proceeds or between different executions of the program, nor can it depend on any external input from I/O devices.
Evaluation of the result does not cause any semantically observable side effect or output, such as mutation of mutable objects or output to I/O devices.
The result value need not depend on all (or any) of the argument values. However, it must depend on nothing other than the argument values.
I guess you could say the adjective is "pure" if you go by "pure function".
I always learnt that a function whose output is always the same when the arguments are always the same is called "deterministic". Personally, I feel that that is a more descriptive term. I guess a "pure function" is by definition deterministic, and it seems a pure function is also required to not have any side-effects. I assume that that need not be the case for all deterministic functions (as long as the return value is always the same for the same arguments).
Wikipedia link: http://en.wikipedia.org/wiki/Deterministic_algorithm
Quote:
Given a particular input, it will always produce the same output, and the underlying machine will always pass through the same sequence of states.