Elegant way to handle "impossible" code paths - language-agnostic

Occasionally I'll have a situation where I've written some code and, based on its logic, a certain path is impossible. For example:
activeGames = [10, 20, 30]
limit = 4
def getBestActiveGameStat():
if not activeGames: return None
return max(activeGames)
def bah():
if limit == 0: return "Limit is 0"
if len(activeGames) >= limit:
somestat = getBestActiveGameStat()
if somestat is None:
print "The universe has exploded"
#etc...
What would go in the universe exploding line? If limit is 0, then the function returns. If len(activeGames) >= limit, then there must be at least one active game, so getBestActiveGameStat() can't return None. So, should I even check for it?
The same also happens with something like a while loop which always returns in the loop:
def hmph():
while condition:
if foo: return "yep"
doStuffToMakeFooTrue()
raise SingularityFlippedMyBitsError()
Since I "know" it's impossible, should anything even be there?

If len(activeGames) >= limit, then
there must be at least one active
game, so getBestActiveGameStat() can't
return None. So, should I even check
for it?
Sometimes we make mistakes. You could have a program error now -- or someone could create one later.
Those errors might result in exceptions or failed unit tests. But debugging is expensive; it's useful to have multiple ways to detect errors.
A quickly written assert statement can express an expected invariant to human readers. And when debugging, a failed assertion can pinpoint an error quickly.
Sutter and Alexandrescu address this issue in "C++ Coding Standards." Despite the title, their arguments and guidelines are are language agnostic.
Assert liberally to document internal assumptions and invariants
... Use assert or an equivalent liberally to document assumptions internal to a module ... that must always be true and otherwise represent programming errors.
For example, if the default case in a switch statement cannot occur, add the case with assert(false).

IMHO, the first example is really more a question of how catastrophic failures are presented to the user. In the event that someone does something really silly and sets activeGames to none, most languages will throw a NullPointer/InvalidReference type of exception. If you have a good system for catching these kinds of errors and handling them elegantly, then I would argue that you leave these guards out entirely.
If you have a decent set of unit tests, they will ensure with huge amounts of certainty that this kind of problem does not escape the developers machine.
As for the second one, what you're really guarding against is a race condition. What if the "doStuffToMakeFooTrue()" method never makes foo true? This code will eventually run itself into the ground. Rather than risk that, I'll usually put code like this on a timer. If your language has closures or function pointers (honestly not sure about Python...), you can hide the implementation of the timing logic in a nice helper method, and call it this way:
withTiming(hmph, 30) // run for 30 seconds, then fail
If you don't have closures or function pointers, you'll have to do it the long way everywhere:
stopwatch = new Stopwatch(30)
stopwatch.start()
while stopwatch.elapsedTimeInSeconds() < 30
hmph()
raise OperationTimedOutError()

Related

Why does 'return' end a function

I'm just curious about why return ends the function.
Why do we not write
function Foo (){
BAR = calculate();
give back BAR;
//do sth later
log(BAR);
end;
}
Why do we need to do this?
function Foo (){
BAR = calculate();
log(BAR);
return BAR;
}
Is this to prevent multiple usage of a give back/return value in a function?
The idea of a function stems from mathematics, e.g. x = f(y). Once you have computed f(y) for a specific value of y, you can simply substitute that value in that equation for the same result, e.g. x = 42. So the notion of a function having one result or one return value is quite strong. Further, such mathematical functions are pure, meaning they have no side effects. In the above formula it doesn’t make a difference whether you write f(y) or its computed result 42, the function doesn’t do anything else and hence won’t change the result. Being able to make these assumptions makes it much easier to reason about formulas and programs.
return in programming also has practical implementation implications, as most languages typically pop the stack upon returning, based on the assumption/restriction that it’s not needed any further.
Many languages do allow a function to “spit out” a value yet continue, which is usually implemented as generators and the yield keyword. However, the generator won’t typically simply continue running in the background, it needs to be explicitly invoked again to yield its next value. A transfer of control is necessary; either the generator runs, or its caller does, they can’t both run simultaneously.
If you did want to run two pieces of code simultaneously, be that a generator or a function’s “after return block”, you need to decide on a mode of multitasking like threading, or cooperative multitasking (async execution) or something else, which brings with it all the fun difficulties of managing shared resource access and the like. While it’s not unthinkable to write a language which would handle that implicitly and elegantly, elegant implicit multitasking which manages all these difficulties automagically simply does not fit into most C-like languages. Which is likely one of many reasons leading to a simple stack-popping, function-terminating return statement.
Using return gives you a lot of flexibility regarding where, when and how you return the value of a function as well as an easy to read statement of 'I am now returning this value'.
If following your idea, you could have a situation where the function got evaulated to some value and you have to figure out if that assignment got changed somewhere later in the flow.

Python practices: Is there a better way to check constructor parameters?

I find myself trying to convert constructor parameters to their right types very often in my Python programs. So far I've been using code similar to this, so I don't have to repeat the exception arguments:
class ClassWithThreads(object):
def __init__(self, num_threads):
try:
self.num_threads= int(num_threads)
if self.num_threads <= 0:
raise ValueError()
except ValueError:
raise ValueError("invalid thread count")
Is this a good practice? Should I just don't bother catching any exceptions on conversion and let them propagate to the caller, with the possible disadvantage of having less meaningful and consistent error messages?
When I have a question like this, I go hunting in the standard library for code that I can model my code after. multiprocessing/pool.py has a class somewhat close to yours:
class Pool(object):
def __init__(self, processes=None, initializer=None, initargs=(),
maxtasksperchild=None):
...
if processes is None:
try:
processes = cpu_count()
except NotImplementedError:
processes = 1
if processes < 1:
raise ValueError("Number of processes must be at least 1")
if initializer is not None and not hasattr(initializer, '__call__'):
raise TypeError('initializer must be a callable')
Notice that it does not say
processes = int(processes)
It just assumes you sent it an integer, not a float or a string, or whatever.
It should be pretty obvious, but if you feel it is not, I think it suffices to just document it.
It does raise ValueError if processes < 1, and it does check that initializer, when given, is callable.
So, if we take multiprocessing.Pool as a model, your class should look like this:
class ClassWithThreads(object):
def __init__(self, num_threads):
self.num_threads = num_threads
if self.num_threads < 1:
raise ValueError('Number of threads must be at least 1')
Wouldn't this approach possibly fail very unpredictably for some
conditions?
I think preemptive type checking generally goes against the grain of Python's
(dynamic-, duck-typing) design philosophy.
Duck typing gives Python programmers opportunities for great expressive power,
and rapid code development but (some might say) is dangerous because it makes no
attempt to catch type errors.
Some argue that logical errors are far more serious and frequent than type
errors. You need unit tests to catch those more serious errors. So even if you
do do preemptive type checking, it does not add much protection.
This debate lies in the realm of opinions, not facts, so it is not a resolvable argument. On which side of the fence
you sit may depend on your experience, your judgment on the likelihood of type
errors. It may be biased by what languages you already know. It may depend on
your problem domain.
You just have to decide for yourself.
PS. In a statically typed language, the type checks can be done at compile-time, thus not impeding the speed of the program. In Python, the type checks have to occur at run-time. This will slow the program down a bit, and maybe a lot if the checking occurs in a loop. As the program grows, so will the number of type checks. And unfortunately, many of those checks may be redundant. So if you really believe you need type checking, you probably should be using a statically-typed language.
PPS. There are decorators for type checking for (Python 2) and (Python 3). This would separate the type checking code from the rest of the function, and allow you to more easily turn off type checking in the future if you so choose.
You could use a type checking decorator like this activestate recipe or this other one for python 3. They allow you to write code something like this:
#require("x", int, float)
#require("y", float)
def foo(x, y):
return x+y
that will raise an exception if the arguments are not of the required type. You could easily extend the decorators to check that the arguments have valid values aswell.
This is subjective, but here's a counter-argument:
>>> obj = ClassWithThreads("potato")
ValueError: invalid thread count
Wait, what? That should be a TypeError. I would do this:
if not isinstance(num_threads, int):
raise TypeError("num_threads must be an integer")
if num_threads <= 0:
raise ValueError("num_threads must be positive")
Okay, so this violates "duck typing" principles. But I wouldn't use duck typing for primitive objects like int.

assert() vs enforce(): Which to choose?

I'm having a hard time choosing whether I should "enforce" a condition or "assert" a condition in D. (This is language-neutral, though.)
Theoretically, I know that you use assertions to find bugs, and you enforce other conditions in order to check for atypical conditions. E.g. you might say assert(count >= 0) for an argument to your method, because that indicates that there's a bug with the caller, and that you would say enforce(isNetworkConnected), because that's not a bug, it's just something that you're assuming that could very well not be true in a legitimate situation beyond your control.
Furthermore, assertions can be removed from code as an optimization, with no side effects, but enforcements cannot be removed because they must always execute their condition code. Hence if I'm implementing a lazy-filled container that fills itself on the first access to any of its methods, I say enforce(!empty()) instead of assert(!empty()), because the check for empty() must always occur, since it lazily executes code inside.
So I think I know that they're supposed to mean. But theory is easier than practice, and I'm having a hard time actually applying the concepts.
Consider the following:
I'm making a range (similar to an iterator) that iterates over two other ranges, and adds the results. (For functional programmers: I'm aware that I can use map!("a + b") instead, but I'm ignoring that for now, since it doesn't illustrate the question.) So I have code that looks like this in pseudocode:
void add(Range range1, Range range2)
{
Range result;
while (!range1.empty)
{
assert(!range2.empty); //Should this be an assertion or enforcement?
result += range1.front + range2.front;
range1.popFront();
range2.popFront();
}
}
Should that be an assertion or an enforcement? (Is it the caller's fault that the ranges don't empty at the same time? It might not have control of where the range came from -- it could've come from a user -- but then again, it still looks like a bug, doesn't it?)
Or here's another pseudocode example:
uint getFileSize(string path)
{
HANDLE hFile = CreateFile(path, ...);
assert(hFile != INVALID_HANDLE_VALUE); //Assertion or enforcement?
return GetFileSize(hFile); //and close the handle, obviously
}
...
Should this be an assertion or an enforcement? The path might come from a user -- so it might not be a bug -- but it's still a precondition of this method that the path should be valid. Do I assert or enforce?
Thanks!
I'm not sure it is entirely language-neutral. No language that I use has enforce(), and if I encountered one that did then I would want to use assert and enforce in the ways they were intended, which might be idiomatic to that language.
For instance assert in C or C++ stops the program when it fails, it doesn't throw an exception, so its usage may not be the same as what you're talking about. You don't use assert in C++ unless you think that either the caller has already made an error so grave that they can't be relied on to clean up (e.g. passing in a negative count), or else some other code elsewhere has made an error so grave that the program should be considered to be in an undefined state (e.g. your data structure appears corrupt). C++ does distinguish between runtime errors and logic errors, though, which may roughly correspond but I think are mostly about avoidable vs. unavoidable errors.
In the case of add you'd use a logic error if the author's intent is that a program which provides mismatched lists has bugs and needs fixing, or a runtime exception if it's just one of those things that might happen. For instance if your function were to handle arbitrary generators, that don't necessarily have a means of reporting their length short of destructively evaluating the whole sequence, you'd be more likely consider it an unavoidable error condition.
Calling it a logic error implies that it's the caller's responsibility to check the length before calling add, if they can't ensure it by the exercise of pure reason. So they would not be passing in a list from a user without explicitly checking the length first, and in all honesty should count themselves lucky they even got an exception rather than undefined behavior.
Calling it a runtime error expresses that it's "reasonable" (if abnormal) to pass in lists of different lengths, with the exception indicating that it happened on this occasion. Hence I think an enforcement rather than an assertion.
In the case of filesize: for the existence of a file, you should if possible treat that as a potentially recoverable failure (enforcement), not a bug (assertion). The reason is simply that there is no way for the caller to be certain that a file exists - there's always someone with more privileges who can come along and remove it, or unmount the entire fielsystem, in between a check for existence and a call to filesize. It's therefore not necessarily a logical flaw in the calling code when it doesn't exist (although the end-user might have shot themselves in the foot). Because of that fact it's likely there will be callers who can treat it as just one of those things that happens, an unavoidable error condition. Creating a file handle could also fail for out-of-memory, which is another unavoidable error on most systems, although not necessarily a recoverable one if for example over-committing is enabled.
Another example to consider is operator[] vs. at() for C++'s vector. at() throws out_of_range, a logic error, not because it's inconceivable that a caller might want to recover, or because you have to be some kind of numbskull to make the mistake of accessing an array out of range using at(), but because the error is entirely avoidable if the caller wants it to be - you can always check the size() before access if you have no other way of knowing whether your index is good or not. And so operator[] doesn't guarantee any checks at all, and in the name of efficiency an out of range access has undefined behavior.
assert should be considered a "run-time checked comment" indicating an assumption that the programmer makes at that moment. The assert is part of the function implementation. A failed assert should always be considered a bug at the point where the wrong assumption is made, so at the code location of the assert. To fix the bug, use a proper means to avoid the situation.
The proper means to avoid bad function inputs are contracts, so the example function should have a input contract that checks that range2 is at least as long as range1. The assertion inside the implementation could then still remain in place. Especially in longer more complex implementations, such an assert may inprove understandability.
An enforce is a lazy approach to throwing runtime exceptions. It is nice for quick-and-dirty code because it is better to have a check in there rather then silently ignoring the possibility of a bad condition. For production code, it should be replaced by a proper mechanism that throws a more meaningful exception.
I believe you have partly answered your question yourself. Assertions are bound to break the flow. If your assertion is wrong, you will not agree to continue with anything. If you enforce something you are making a decision to allow something to happen based on the situation. If you find that the conditions are not met, you can enforce that the entry to a particular section is denied.

Should functions that only output return anything?

I'm rewriting a series of PHP functions to a container class. Many of these functions do a bit of processing, but in the end, just echo content to STDOUT.
My question is: should I have a return value within these functions? Is there a "best practice" as far as this is concerned?
In systems that report errors primarily through exceptions, don't return a return value if there isn't a natural one.
In systems that use return values to indicate errors, it's useful to have all functions return the error code. That way, a user can simply assume that every single function returns an error code and develop a pattern to check them that they follow everywhere. Even if the function can never fail right now, return a success code. That way if a future change makes it possible to have an error, users will already be checking errors instead of implicitly silently ignoring them (and getting really confused why the system is behaving oddly).
Can the processing fail? If so, should the caller know about that? If either of these is no, then I don't see value in a return. However, if the processing can fail, and that can make a difference to the caller, then I'd suggest returning a status or error code.
Do not return a value if there is no value to return. If you have some value you need to convey to the caller, then return it but that doesn't sound like the case in this instance.
I will often "return: true;" in these cases, as it provides a way to check that the function worked. Not sure about best practice though.
Note that in C/C++, the output functions (including printf()) return the number of bytes written, or -1 if this fails. It may be worth investigating this further to see why it's been done like this. I confess that
I'm not sure that writing to stdout could practically fail (unless you actively close your STDOUT stream)
I've never seen anyone collect this value, let alone do anything with it.
Note that this is distinct from writing to file streams - I'm not counting stream redirection in the shell.
To do the "correct" thing, if the point of the method is only to print the data, then it shouldn't return anything.
In practice, I often find that having such functions return the text that they've just printed can often be useful (sometimes you also want to send an error message via email or feed it to some other function).
In the end, the choice is yours. I'd say it depends on how much of a "purist" you are about such things.
You should just:
return;
In my opinion the SRP (single responsibility principle) is applicable for methods/functions as well, and not only for objects. One method should do one thing, if it outputs data it shouldn't do any data processing - if it doesn't do processing it shouldn't return data.
There is no need to return anything, or indeed to have a return statement. It's effectively a void function, and it's comprehensible enough that these have no return value. Putting in a 'return;' solely to have a return statement is noise for the sake of pedantry.

Should you check for wrong parameter values in the constructor?

Do you check for data validity in every constructor, or do you just assume the data is correct and throw exceptions in the specific function that has a problem with the parameter?
A constructor is a function too - why differentiate?
Creating an object implies that all the integrity checks have been done. It's perfectly reasonable to check parameters in a constructor and throw an exception once an illegal value has been detected.
Among all this simplifies debugging. When your program throws exception in a constructor you can observe a stack trace and often immediately see the cause. If you delay the check you'll then have to do more investigation to detect what earlier event causes the current error.
It's always better to have a fully-constructed object with all the invariants "satisfied" from the very beginning. Beware, however, of throwing exceptions from constructor in non-managed languages since that may result in a memory leak.
I always coerce values in constructors. If the users can't be bothered to follow the rules, I just silently enforce them without telling them.
So, if they pass a value of 107%, I'll set it to 100%. I just make it clear in the documentation that that's what happens.
Only if there's no obvious logical coercion do I throw an exception back to them. I like to call this the "principal of most astonishment to those too lazy or stupid to read the documentation".
If you throw in the constructor, the stack trace is more likely to show where the wrong values are coming from.
I agree with sharptooth in the general case, but there are sometimes objects that can have states in which some functions are valid and some are not. In these situations it's better to defer checks to the functions with these particular dependencies.
Usually you'll want to validate in a constructor, if only because that means your objects will always be in a valid state. But some kinds of objects need function-specific checks, and that's OK too.
This is a difficult question. I have changed my habit related to this question several times in the last years, so here are some thoughts coming to my mind :
Checking the arguments in the constructor is like checking the arguments for a traditional method... so I would not differentiate here...
Checking the arguments of a method of course does help to ensure parameters passed are correct, but also introduces a lot of more code you have to write, which not always pays off in my opinion... Especially when you do write short methods which delegate quite frequently to other methods, I guess you would end up with 50 % of your code just doing parameter checks...
Furthermore consider that very often when you write a unit test for your class, you don't want to have to pass in valid parameters for all methods... Quite often it does make sense to only pass those parameters important for the current test case, so having checks would slow you down in writing unit tests...
I think this really depends on the situation... what I ended up is to only write parameter checks when I know that the parameters do come from an external system (like user input, databases, web services, or something like that...). If data comes from my own system, I don't write tests most of the time.
I always try to fail as early as possible. So I definitively check the parameters in the constructor.
In theory, the calling code should always ensure that preconditions are met, before calling a function. The same goes for constructors.
In practice, programmers are lazy, and to be sure it's best to still check preconditions. Asserts come in handy there.
Example. Excuse my non-curly brace syntax:
// precondition b<>0
function divide(a,b:double):double;
begin
assert(b<>0); // in case of a programming error.
result := a / b;
end;
// calling code should be:
if foo<>0 then
bar := divide(1,0)
else
// do whatever you need to do when foo equals 0
Alternatively, you can always change the preconditions. In case of a constructor this is not really handy.
// no preconditions.. still need to check the result
function divide(a,b:double; out hasResult:boolean):double;
begin
hasResult := b<>0;
if not hasResult then
Exit;
result := a / b;
end;
// calling code:
bar := divide(1,0,result);
if not result then
// do whatever you need to do when the division failed
Always fail as soon as possible. A good example of this practice is exhibited by the runtime ..
* if you try and use an invalid index for an array you get an exception.
* casting fails immediately when attempted not at some later stage.
Imagine what a disaster code be if a bad cast remained in a reference and everytime you tried to use that you got an exception. The only sensible approach is to fail as soon as possible and try and avoid bad / invalid state asap.