Python practices: Is there a better way to check constructor parameters? - exception

I find myself trying to convert constructor parameters to their right types very often in my Python programs. So far I've been using code similar to this, so I don't have to repeat the exception arguments:
class ClassWithThreads(object):
def __init__(self, num_threads):
try:
self.num_threads= int(num_threads)
if self.num_threads <= 0:
raise ValueError()
except ValueError:
raise ValueError("invalid thread count")
Is this a good practice? Should I just don't bother catching any exceptions on conversion and let them propagate to the caller, with the possible disadvantage of having less meaningful and consistent error messages?

When I have a question like this, I go hunting in the standard library for code that I can model my code after. multiprocessing/pool.py has a class somewhat close to yours:
class Pool(object):
def __init__(self, processes=None, initializer=None, initargs=(),
maxtasksperchild=None):
...
if processes is None:
try:
processes = cpu_count()
except NotImplementedError:
processes = 1
if processes < 1:
raise ValueError("Number of processes must be at least 1")
if initializer is not None and not hasattr(initializer, '__call__'):
raise TypeError('initializer must be a callable')
Notice that it does not say
processes = int(processes)
It just assumes you sent it an integer, not a float or a string, or whatever.
It should be pretty obvious, but if you feel it is not, I think it suffices to just document it.
It does raise ValueError if processes < 1, and it does check that initializer, when given, is callable.
So, if we take multiprocessing.Pool as a model, your class should look like this:
class ClassWithThreads(object):
def __init__(self, num_threads):
self.num_threads = num_threads
if self.num_threads < 1:
raise ValueError('Number of threads must be at least 1')
Wouldn't this approach possibly fail very unpredictably for some
conditions?
I think preemptive type checking generally goes against the grain of Python's
(dynamic-, duck-typing) design philosophy.
Duck typing gives Python programmers opportunities for great expressive power,
and rapid code development but (some might say) is dangerous because it makes no
attempt to catch type errors.
Some argue that logical errors are far more serious and frequent than type
errors. You need unit tests to catch those more serious errors. So even if you
do do preemptive type checking, it does not add much protection.
This debate lies in the realm of opinions, not facts, so it is not a resolvable argument. On which side of the fence
you sit may depend on your experience, your judgment on the likelihood of type
errors. It may be biased by what languages you already know. It may depend on
your problem domain.
You just have to decide for yourself.
PS. In a statically typed language, the type checks can be done at compile-time, thus not impeding the speed of the program. In Python, the type checks have to occur at run-time. This will slow the program down a bit, and maybe a lot if the checking occurs in a loop. As the program grows, so will the number of type checks. And unfortunately, many of those checks may be redundant. So if you really believe you need type checking, you probably should be using a statically-typed language.
PPS. There are decorators for type checking for (Python 2) and (Python 3). This would separate the type checking code from the rest of the function, and allow you to more easily turn off type checking in the future if you so choose.

You could use a type checking decorator like this activestate recipe or this other one for python 3. They allow you to write code something like this:
#require("x", int, float)
#require("y", float)
def foo(x, y):
return x+y
that will raise an exception if the arguments are not of the required type. You could easily extend the decorators to check that the arguments have valid values aswell.

This is subjective, but here's a counter-argument:
>>> obj = ClassWithThreads("potato")
ValueError: invalid thread count
Wait, what? That should be a TypeError. I would do this:
if not isinstance(num_threads, int):
raise TypeError("num_threads must be an integer")
if num_threads <= 0:
raise ValueError("num_threads must be positive")
Okay, so this violates "duck typing" principles. But I wouldn't use duck typing for primitive objects like int.

Related

Why is my %h is List = 1,2; a valid assignment?

While finalizing my upcoming Raku Advent Calendar post on sigils, I decided to double-check my understanding of the type constraints that sigils create. The docs describe sigil type constraints with the table
below:
Based on this table (and my general understanding of how sigils and containers work), I strongly expected this code
my %percent-sigil is List = 1,2;
my #at-sigil is Map = :k<v>;
to throw an error.
Specifically, I expected that is List would attempt to bind the %-sigiled variable to a List, and that this would throw an X::TypeCheck::Binding error – the same error that my %h := 1,2 throws.
But it didn't error. The first line created a List that seemed perfectly ordinary in every way, other than the sigil on its variable. And the second created a seemingly normal Map. Neither of them secretly had Scalar intermediaries, at least as far as I could tell with VAR and similar introspection.
I took a very quick look at the World.nqp source code, and it seems at least plausible that discarding the % type constraint with is List is intended behavior.
So, is this behavior correct/intended? If so, why? And how does that fit in with the type constraints and other guarantees that sigils typically provide?
(I have to admit, seeing an %-sigiled variable that doesn't support Associative indexing kind of shocked me…)
I think this is a grey area, somewhere between DIHWIDT (Docter, It Hurts When I Do This) and an oversight in implementation.
Thing is, you can create your own class and use that in the is trait. Basically, that overrides the type with which the object will be created from the default Hash (for %) and Array (for # sigils). As long as you provide the interface methods, it (currently) works. For example:
class Foo {
method AT-KEY($) { 42 }
}
my %h is Foo;
say %h<a>; # 42
However, if you want to pass such an object as an argument to a sub with a % sigil in the signature, it will fail because the class did not consume the Associatve role:
sub bar(%) { 666 }
say bar(%h);
===SORRY!=== Error while compiling -e
Calling bar(A) will never work with declared signature (%)
I'm not sure why the test for Associative (for the % sigil) and Positional (for #) is not enforced at compile time with the is trait. I would assume it was an oversight, maybe something to be fixed in 6.e.
Quoting the Parameters and arguments section of the S06 specification/speculation document about the related issue of binding arguments to routine parameters:
Array and hash parameters are simply bound "as is". (Conjectural: future versions ... may do static analysis and forbid assignments to array and hash parameters that can be caught by it. This will, however, only happen with the appropriate use declaration to opt in to that language version.)
Sure enough the Rakudo compiler implemented some rudimentary static analysis (in its AOT compilation optimization pass) that normally (but see footnote 3 in this SO answer) insists on binding # routine parameters to values that do the Positional role and % ones to Associatives.
I think this was the case from the first official Raku supporting release of Rakudo, in 2016, but regardless, I'm pretty sure the "appropriate use declaration" is any language version declaration, including none. If your/our druthers are static typing for the win for # and % sigils, and I think they are, then that's presumably very appropriate!
Another source is the IRC logs. A quick search quickly got me nothing.
Hmm. Let's check the blame for the above verbiage so I can find when it was last updated and maybe spot contemporaneous IRC discussion. Oooh.
That is an extraordinary read.
"oversight" isn't the right word.
I don't have time tonight to search the IRC logs to see what led up to that commit, but I daresay it's interesting. The previous text was talking about a PL design I really liked the sound of in terms of immutability, such that code could become increasingly immutable by simply swapping out one kind of scalar container for another. Very nice! But reality is important, and Jonathan switched the verbiage to the implementation reality. The switch toward static typing certainty is welcome, but has it seriously harmed the performance and immutability options? I don't know. Time for me to go to sleep and head off for seasonal family visits. Happy holidays...

create_mutable/2 in SICStus Prolog

The SICStus Prolog manual page on mutable terms states that:
[...] the effect of unifying two mutables is undefined.
Then, why does create_mutable(data,x) fail?
Shouldn't that rather raise an uninstantiation_error?
I cannot think of a situation when above case is not an unintentional programming error (X vs x)... please help!
The short answer to "Why does create_mutable/2 not throw an exception when output unification fails?" is just: Because this was how it was done when the feature was added to SICStus Prolog, and no one has made a strong case for changing this.
One important "difference between the stream created by open/4 and the mutable term created by create_mutable/2" is that open/4 has side-effects that are not undone if the output-unification of the call to open/4 fails.
In this sense, create_mutable/2 is somewhat more like is/2 which also just quietly fails if the output argument is some non-numeric non-variable term, e.g. in x is 3+4. This seems to be the common, and traditional, way of handling output arguments in Prolog.
I agree that a non-variable as second argument is most likely a programming error. The next version of the SICStus IDE, SPIDER, will warn for this (as it already does for is/2).
None of this, nor the example in the question, seems directly related to the cited documentation "[...] the effect of unifying two mutables [...]".

assert() vs enforce(): Which to choose?

I'm having a hard time choosing whether I should "enforce" a condition or "assert" a condition in D. (This is language-neutral, though.)
Theoretically, I know that you use assertions to find bugs, and you enforce other conditions in order to check for atypical conditions. E.g. you might say assert(count >= 0) for an argument to your method, because that indicates that there's a bug with the caller, and that you would say enforce(isNetworkConnected), because that's not a bug, it's just something that you're assuming that could very well not be true in a legitimate situation beyond your control.
Furthermore, assertions can be removed from code as an optimization, with no side effects, but enforcements cannot be removed because they must always execute their condition code. Hence if I'm implementing a lazy-filled container that fills itself on the first access to any of its methods, I say enforce(!empty()) instead of assert(!empty()), because the check for empty() must always occur, since it lazily executes code inside.
So I think I know that they're supposed to mean. But theory is easier than practice, and I'm having a hard time actually applying the concepts.
Consider the following:
I'm making a range (similar to an iterator) that iterates over two other ranges, and adds the results. (For functional programmers: I'm aware that I can use map!("a + b") instead, but I'm ignoring that for now, since it doesn't illustrate the question.) So I have code that looks like this in pseudocode:
void add(Range range1, Range range2)
{
Range result;
while (!range1.empty)
{
assert(!range2.empty); //Should this be an assertion or enforcement?
result += range1.front + range2.front;
range1.popFront();
range2.popFront();
}
}
Should that be an assertion or an enforcement? (Is it the caller's fault that the ranges don't empty at the same time? It might not have control of where the range came from -- it could've come from a user -- but then again, it still looks like a bug, doesn't it?)
Or here's another pseudocode example:
uint getFileSize(string path)
{
HANDLE hFile = CreateFile(path, ...);
assert(hFile != INVALID_HANDLE_VALUE); //Assertion or enforcement?
return GetFileSize(hFile); //and close the handle, obviously
}
...
Should this be an assertion or an enforcement? The path might come from a user -- so it might not be a bug -- but it's still a precondition of this method that the path should be valid. Do I assert or enforce?
Thanks!
I'm not sure it is entirely language-neutral. No language that I use has enforce(), and if I encountered one that did then I would want to use assert and enforce in the ways they were intended, which might be idiomatic to that language.
For instance assert in C or C++ stops the program when it fails, it doesn't throw an exception, so its usage may not be the same as what you're talking about. You don't use assert in C++ unless you think that either the caller has already made an error so grave that they can't be relied on to clean up (e.g. passing in a negative count), or else some other code elsewhere has made an error so grave that the program should be considered to be in an undefined state (e.g. your data structure appears corrupt). C++ does distinguish between runtime errors and logic errors, though, which may roughly correspond but I think are mostly about avoidable vs. unavoidable errors.
In the case of add you'd use a logic error if the author's intent is that a program which provides mismatched lists has bugs and needs fixing, or a runtime exception if it's just one of those things that might happen. For instance if your function were to handle arbitrary generators, that don't necessarily have a means of reporting their length short of destructively evaluating the whole sequence, you'd be more likely consider it an unavoidable error condition.
Calling it a logic error implies that it's the caller's responsibility to check the length before calling add, if they can't ensure it by the exercise of pure reason. So they would not be passing in a list from a user without explicitly checking the length first, and in all honesty should count themselves lucky they even got an exception rather than undefined behavior.
Calling it a runtime error expresses that it's "reasonable" (if abnormal) to pass in lists of different lengths, with the exception indicating that it happened on this occasion. Hence I think an enforcement rather than an assertion.
In the case of filesize: for the existence of a file, you should if possible treat that as a potentially recoverable failure (enforcement), not a bug (assertion). The reason is simply that there is no way for the caller to be certain that a file exists - there's always someone with more privileges who can come along and remove it, or unmount the entire fielsystem, in between a check for existence and a call to filesize. It's therefore not necessarily a logical flaw in the calling code when it doesn't exist (although the end-user might have shot themselves in the foot). Because of that fact it's likely there will be callers who can treat it as just one of those things that happens, an unavoidable error condition. Creating a file handle could also fail for out-of-memory, which is another unavoidable error on most systems, although not necessarily a recoverable one if for example over-committing is enabled.
Another example to consider is operator[] vs. at() for C++'s vector. at() throws out_of_range, a logic error, not because it's inconceivable that a caller might want to recover, or because you have to be some kind of numbskull to make the mistake of accessing an array out of range using at(), but because the error is entirely avoidable if the caller wants it to be - you can always check the size() before access if you have no other way of knowing whether your index is good or not. And so operator[] doesn't guarantee any checks at all, and in the name of efficiency an out of range access has undefined behavior.
assert should be considered a "run-time checked comment" indicating an assumption that the programmer makes at that moment. The assert is part of the function implementation. A failed assert should always be considered a bug at the point where the wrong assumption is made, so at the code location of the assert. To fix the bug, use a proper means to avoid the situation.
The proper means to avoid bad function inputs are contracts, so the example function should have a input contract that checks that range2 is at least as long as range1. The assertion inside the implementation could then still remain in place. Especially in longer more complex implementations, such an assert may inprove understandability.
An enforce is a lazy approach to throwing runtime exceptions. It is nice for quick-and-dirty code because it is better to have a check in there rather then silently ignoring the possibility of a bad condition. For production code, it should be replaced by a proper mechanism that throws a more meaningful exception.
I believe you have partly answered your question yourself. Assertions are bound to break the flow. If your assertion is wrong, you will not agree to continue with anything. If you enforce something you are making a decision to allow something to happen based on the situation. If you find that the conditions are not met, you can enforce that the entry to a particular section is denied.

Elegant way to handle "impossible" code paths

Occasionally I'll have a situation where I've written some code and, based on its logic, a certain path is impossible. For example:
activeGames = [10, 20, 30]
limit = 4
def getBestActiveGameStat():
if not activeGames: return None
return max(activeGames)
def bah():
if limit == 0: return "Limit is 0"
if len(activeGames) >= limit:
somestat = getBestActiveGameStat()
if somestat is None:
print "The universe has exploded"
#etc...
What would go in the universe exploding line? If limit is 0, then the function returns. If len(activeGames) >= limit, then there must be at least one active game, so getBestActiveGameStat() can't return None. So, should I even check for it?
The same also happens with something like a while loop which always returns in the loop:
def hmph():
while condition:
if foo: return "yep"
doStuffToMakeFooTrue()
raise SingularityFlippedMyBitsError()
Since I "know" it's impossible, should anything even be there?
If len(activeGames) >= limit, then
there must be at least one active
game, so getBestActiveGameStat() can't
return None. So, should I even check
for it?
Sometimes we make mistakes. You could have a program error now -- or someone could create one later.
Those errors might result in exceptions or failed unit tests. But debugging is expensive; it's useful to have multiple ways to detect errors.
A quickly written assert statement can express an expected invariant to human readers. And when debugging, a failed assertion can pinpoint an error quickly.
Sutter and Alexandrescu address this issue in "C++ Coding Standards." Despite the title, their arguments and guidelines are are language agnostic.
Assert liberally to document internal assumptions and invariants
... Use assert or an equivalent liberally to document assumptions internal to a module ... that must always be true and otherwise represent programming errors.
For example, if the default case in a switch statement cannot occur, add the case with assert(false).
IMHO, the first example is really more a question of how catastrophic failures are presented to the user. In the event that someone does something really silly and sets activeGames to none, most languages will throw a NullPointer/InvalidReference type of exception. If you have a good system for catching these kinds of errors and handling them elegantly, then I would argue that you leave these guards out entirely.
If you have a decent set of unit tests, they will ensure with huge amounts of certainty that this kind of problem does not escape the developers machine.
As for the second one, what you're really guarding against is a race condition. What if the "doStuffToMakeFooTrue()" method never makes foo true? This code will eventually run itself into the ground. Rather than risk that, I'll usually put code like this on a timer. If your language has closures or function pointers (honestly not sure about Python...), you can hide the implementation of the timing logic in a nice helper method, and call it this way:
withTiming(hmph, 30) // run for 30 seconds, then fail
If you don't have closures or function pointers, you'll have to do it the long way everywhere:
stopwatch = new Stopwatch(30)
stopwatch.start()
while stopwatch.elapsedTimeInSeconds() < 30
hmph()
raise OperationTimedOutError()

What are the cons of returning an Exception instance instead of raising it in Python?

I have been doing some work with python-couchdb and desktopcouch. In one of the patches I submitted I wrapped the db.update function from couchdb. For anyone that is not familiar with python-couchdb the function is the following:
def update(self, documents, **options):
"""Perform a bulk update or insertion of the given documents using a
single HTTP request.
>>> server = Server('http://localhost:5984/')
>>> db = server.create('python-tests')
>>> for doc in db.update([
... Document(type='Person', name='John Doe'),
... Document(type='Person', name='Mary Jane'),
... Document(type='City', name='Gotham City')
... ]):
... print repr(doc) #doctest: +ELLIPSIS
(True, '...', '...')
(True, '...', '...')
(True, '...', '...')
>>> del server['python-tests']
The return value of this method is a list containing a tuple for every
element in the `documents` sequence. Each tuple is of the form
``(success, docid, rev_or_exc)``, where ``success`` is a boolean
indicating whether the update succeeded, ``docid`` is the ID of the
document, and ``rev_or_exc`` is either the new document revision, or
an exception instance (e.g. `ResourceConflict`) if the update failed.
If an object in the documents list is not a dictionary, this method
looks for an ``items()`` method that can be used to convert the object
to a dictionary. Effectively this means you can also use this method
with `schema.Document` objects.
:param documents: a sequence of dictionaries or `Document` objects, or
objects providing a ``items()`` method that can be
used to convert them to a dictionary
:return: an iterable over the resulting documents
:rtype: ``list``
:since: version 0.2
"""
As you can see, this function does not raise the exceptions that have been raised by the couchdb server but it rather returns them in a tuple with the id of the document that we wanted to update.
One of the reviewers went to #python on irc to ask about the matter. In #python they recommended to use sentinel values rather than exceptions. As you can imaging just an approach is not practical since there are lots of possible exceptions that can be received. My questions is, what are the cons of using Exceptions over sentinel values besides that using exceptions is uglier?
I think it is ok to return the exceptions in this case, because some parts of the update function may succeed and some may fail. When you raise the exception, the API user has no control over what succeeded already.
Raising an Exception is a notification that something that was expected to work did not work. It breaks the program flow, and should only be done if whatever is going on now is flawed in a way that the program doesn't know how to handle.
But sometimes you want to raise a little error flag without breaking program flow. You can do this by returning special values, and these values can very well be exceptions.
Python does this internally in one case. When you compare two values like foo < bar, the actual call is foo.__lt__(bar). If this method raises an exception, program flow will be broken, as expected. But if it returns NotImplemented, Python will then try bar.__ge__(foo) instead. So in this case returning the exception rather than raising it is used to flag that it didn't work, but in an expected way.
It's really the difference between an expected error and an unexpected one, IMO.
exceptions intended to be raised. It helps with debugging, handling causes of the errors and it's clear and well-established practise of other developers.
I think looking at the interface of the programme, it's not clear what am I supposed to do with returned exception. raise it? from outside of the chain that actually caused it? it seems a bit convoluted.
I'd suggest, returning docid, new_rev_doc tuple on success and propagating/raising exception as it is. Your approach duplicates success and type of 3rd returned value too.
Exceptions cause the normal program flow to break; then exceptions go up the call stack until they're intercept, or they may reach the top if they aren't. Hence they're employed to mark a really special condition that should be handled by the caller. Raising an exception is useful since the program won't continue if a necessary condition has not been met.
In languages that don't support exceptions (like C) you're often forced to check return values of functions to verify everything went on correctly; otherwise the program may misbehave.
By the way the update() is a bit different:
it takes multiple arguments; some may fail, some may succeed, hence it needs a way to communicate results for each arg.
a previous failure has no relation with operations coming next, e.g. it is not a permanent error
In that situation raising an exception would NOT be usueful in an API. On the other hand, if the connection to the db drops while executing the query, then an exception is the way to go (since it's a permament error and would impact all operations coming next).
By the way if your business logic requires all operations to complete successfully and you don't know what to do when an update fails (i.e. your design says it should never happen), feel free to raise an exception in your own code.