Is assert in privation function redundant if check has already been made by the calling public function? - exception

Effective java states a good practice of assertions in private methods.
"For an unexported method, you as the package author control the circumstances under which the method is called, so you can and should ensure that only valid parameter values are ever passed in. Therefore, nonpublic methods should generally check their parameters using assertions, as shown below:
For example:
// Private helper function for a recursive sort
private static void sort(long a[]) {
assert a != null;
// Do the computation;
}
My question is would asserts be required even if the public function calling the sort has a null pointer check ?
Example:
public void computeTwoNumbersThatSumToInputValue(int a[], int x) {
if (a == null) {
throw new Nullptrexception();
}
sort(a);
// code to do the required.
}
In other words, will asserts in private function be 'redudant' or mandatory in this case.
Thanks,

It's redundant if you're sure that you've got the assertion in all the calling code. In some cases, that's very obvious - in other cases it can be less so. If you're calling sort from 20 places in the class, are you sure you've checked it in every case?
It's a matter of taste and balance, with no "one size fits all" answer. The balance is in terms of code clarity (both ways!), performance (in extreme cases) and of course safety. It depends on the exact context, and I wouldn't personally like to even guarantee that I'm entirely consistent. (In other words, "level of caffeine at the time of coding" may turn out to be an influence too.)
Note that your assert is only going to execute when assertions are turned on anyway - I personally prefer to validate parameters consistently however you're running the code. I generally use the Preconditions class from Guava to make preconditions unobtrusive.

Assertions will make the helper function sort more robust to use.
Checking for parameters before passing it to any method is a good methodology to have more control over the Exceptions occurring unintentionally at the runtime.
My suggestion will be to use both the approaches in your code as there is no guarantee that all the callers of sort will do such checks. If assertions in helper methods are algorithmically of high order or seems redundant then this can be disabled (esp for production use) via use of -disableassertions or -da from command-line.

You could do that. I will quote from the Oracle docs.
An assertion is a statement in the JavaTM programming language that
enables you to test your assumptions about your program. For example,
if you write a method that calculates the speed of a particle, you
might assert that the calculated speed is less than the speed of
light.
I do not personally use assertions, but from what I gathered readings the oracle docs on it, it enables you to test your assumptions about what you expect something to do. Try/catch blocks are more for failing gracefully as an inevitability of failures bound to happen (like networking, computer problems). Basically, in a perfect world your code would always run successfully because theres nothing wrong with it code wise. But this isn't a perfect world. Also note:
Experience has shown that writing assertions while programming is one
of the quickest and most effective ways to detect and correct bugs. As
an added benefit, assertions serve to document the inner workings of
your program, enhancing maintainability.
I would say use as a preference. To answer your question, I would mainly use it to test code as the docs say, while testing assumptions you have about your code. As the second quote mentions, it has the added benefit of telling other developers (or future you) what you assume to get as parameters. As a personal preference, I leave control flow to try/catch blocks as that is what they were designed for.
*But keep in mind that assertions could be turned off.

Related

Is it bad practice to have a long initialization method?

Many people have argued about function size. They say that functions in general should be pretty short. Opinions vary from something like 15 lines to "about one screen", which today is probably about 40-80 lines.
Also, functions should always fulfill one task only.
However, there is one kind of function that frequently fails in both criteria in my code: Initialization functions.
For example in an audio application, the audio hardware/API has to be set up, audio data has to be converted to a suitable format and the object state has to properly initialized. These are clearly three different tasks and depending on the API this can easily span more than 50 lines.
The thing with init-functions is that they are generally only called once, so there is no need to re-use any of the components. Would you still break them up into several smaller functions would you consider big initialization functions to be ok?
I would still break the function up by task, and then call each of the lower level functions from within my public-facing initialize function:
void _init_hardware() { }
void _convert_format() { }
void _setup_state() { }
void initialize_audio() {
_init_hardware();
_convert_format();
_setup_state();
}
Writing succinct functions is as much about isolating fault and change as keeping things readable. If you know the failure is in _convert_format(), you can track down the ~40 lines responsible for a bug quite a bit faster. The same thing applies if you commit changes that only touch one function.
A final point, I make use of assert() quite frequently so I can "fail often and fail early", and the beginning of a function is the best place for a couple of sanity-checking asserts. Keeping the function short allows you to test the function more thoroughly based on its more narrow set of duties. It's very hard to unit-test a 400 line function that does 10 different things.
If breaking into smaller parts makes code better structured and/or more readable - do it no matter what the function does. It not about the number of lines it's about code quality.
I would still try to break up the functions into logical units. They should be as long or as short as makes sense. For example:
SetupAudioHardware();
ConvertAudioData();
SetupState();
Assigning them clear names makes everything more intuitive and readable. Also, breaking them apart makes it easier for future changes and/or other programs to reuse them.
In a situation like this I think it comes down to a matter of personal preference. I prefer to have functions do only one thing so I would split the initialization into separate functions, even if they are only called once. However, if someone wanted to do it all in a single function I wouldn't worry about it too much (as long as the code was clear). There are more important things to argue about (like whether curly braces belong on their own separate line).
If you have a lot of components the need to be plugged into each other, it can certainly be reasonably natural to have a large method - even if the creation of each component is refactored into a separate method where feasible.
One alternative to this is to use a Dependency Injection framework (e.g. Spring, Castle Windsor, Guice etc). That has definite pros and cons... while working your way through one big method can be quite painful, you at least have a good idea of where everything is initialized, and there's no need to worry about what "magic" might be going on. Then again, the initialization can't be changed after deployment (as it can with an XML file for Spring, for example).
I think it makes sense to design the main body of your code so that it can be injected - but whether that injection is via a framework or just a hard-coded (and potentially long) list of initialization calls is a choice which may well change for different projects. In both cases the results are hard to test other than by just running the application.
First, a factory should be used instead of an initialization function. That is, rather than have initialize_audio(), you have a new AudioObjectFactory (you can think of a better name here). This maintains separation of concerns.
However, be careful also not to abstract too early. Clearly you do have two concerns already: 1) audio initialization and 2) using that audio. Until, for example, you abstract the audio device to be initialized, or the way a given device may be configured during initialization, your factory method (audioObjectFactory.Create() or whatever), should really be kept to just one big method. Early abstraction serves only to obfuscate design.
Note that audioObjectFactory.Create() is not something that can be unit-tested. Testing it is an integration test, and until there are parts of it that can be abstracted, it will remain an integration test. Later on, you may find that the you have multiple different factories for different configurations; at that point, it might be beneficial to abstract the hardware calls into an interface, so you that you can create unit tests to ensure the various factories configure the hardware in a proper way.
I think it's the wrong approach to try and count the number of lines and determine functions based on that. For something like initialization code I often have a separate function for it, but mostly so that the Load or Init or New functions aren't cluttered and confusing. If you can separate it into a few tasks like others have suggested, then you can name it something useful and help organize. Even if you are calling it just once, it's not a bad habit, and often you find that there are other times when you may want to re-init things and can use that function again.
Just thought I'd throw this out there, since it hasn't been mentioned yet - the Facade Pattern is sometimes cited as an interface to a complex subsystem. I haven't done much with it myself, but the metaphors are usually something like turning on a computer (requires several steps), or turning on a home theater system (turn on TV, turn on receiver, turn down lights, etc...)
Depending on the code structure, might be something worth considering to abstract away your large initialization functions. I still agree with meagar's point though that breaking down functions into _init_X(), _init_Y(), etc. is a good way to go. Even if you aren't going to reuse comments in this code, on your next project, when you say to yourself, "How did I initialize that X-component?", it'll be much easier to go back and pick it out of the smaller _init_X() function than it would be to pick it out of a larger function, especially if the X-initialization is scattered throughout it.
Function length is, as you tagged, a very subjective matter. However, a standard best-practice is to isolate code that is often repeated and/or can function as its own entity. For instance, if your initialization function is loading library files or objects that will be used by a specific library, that block of code should be modularized.
With that said, it's not bad to have an initialization method that's long, as long as it's not long because of lots of repeated code or other snippets that can be abstracted away.
Hope that helps,
Carlos Nunez

Strategy for handling parameter validation in class library

I got a rather big class library that contains a lot of code.
I am looking at how to optimize the performance of some of the code, and for some rather simple utility methods I've found that the parameter validation occupies a rather large portion of the runtime for some core methods.
Let me give a typical example:
A.MethodA1 runs a loop, iterating over a collection, calling B.MethodB1 for each element
B.MethodB1 processes the element and returns the result, it's a rather basic calculation, but since it is used many places, it has been put into its own method instead of being copied and pasted where needed
A.MethodA1 calls C.MethodC1 with the results of B.MethodB1, and puts the result into a list that is returned at the end of the loop
In the case I've found now, B.MethodB1 does rudimentary parameter validation. Since the method calls other internal methods, I'd like to avoid having NullReferenceExceptions several layers deep into the code, and rather fail early, hence B.MethodB1 validates the parameters, like checking for null and some basic range checks on another parameter.
However, in this particular call scenario, it is impossible (due to other program logic) for these parameters to ever have the wrong values. If they had, from the program standpoint, B.MethodB1 would never be called at all for those values, A.MethodA1 would fail before the call to B.MethodB1.
So I was considering removing the parameter validation in B.MethodB1, since it occupies roughly 65% of the method runtime (and this is part of some heavily used code.)
However, B.MethodB1 is a public method, and can thus be called from the program, in which case I want the parameter validation.
So how would you solve this dilemma?
Keep the parameter validation, and take the performance hit
Remove the parameter validation, and have potentially fail-late problems in the method
Split the method into two, one internal that doesn't have parameter validation, called by the "safe" path, and one public that has the parameter validation + a call to the internal version.
The latter one would give me the benefits of having no parameter validation, while still exposing a public entrypoint which does have parameter validation, but for some reason it doesn't sit right with me.
Opinions?
I would go with option 3. I tend to use assertions for private and internal methods and do all the validation in public methods.
By the way, is the performance hit really that big?
That's an interesting question.
Hmmm, makes me think ... "code contracts" .. It would seem like it might be technically possible to statically (at compile time) have certain code contracts be proven to be fulfilled. If this were the case and you had such a compilation validation option you could state these contracts without ever having to validate the conditions at runtime.
It would require that the client code itself be validated against the code contacts.
And, of course it would inevitably be highly dependent on the type of conditions you'd want to write, and it would probably only be feasible to prove these contracts to a certain point (how far up the possible call graph would you go?). Beyond this point the validator might have to beg off, and insist that you place a runtime check (or maybe a validation warning suppression?).
All just idle speculation. Does make me wonder a bit more about C# 4.0 code contracts. I wonder if these have support for static analysis. Have you checked them out? I've been meaning to, but learning F# is having to take priority at the moment!
Update:
Having read up a little on it, it appears that C# 4.0 does indeed have a 'static checker' as well as a binary rewriter (which takes care of altering the output binary so that pre and post condition checks are in the appropriate location)
What's not clear from my extremely quick read, is whether you can opt out of the binary rewriting - what I'm thinking here is that what you'd really be looking for is to use the code contracts, have the metadata (or code) for the contracts maintained within the various assemblies but use only the static checker for at least a selected subset of contracts, so that you in theory get proven safety without any runtime hit.
Here's a link to an article on the code contracts

Understanding complex post-conditions in DbC

I have been reading over design-by-contract posts and examples, and there is something that I cannot seem to wrap my head around. In all of the examples I have seen, DbC is used on a trivial class testing its own state in the post-conditions (e.g. lots of Bank Accounts).
It seems to me that most of the time when you call a method of a class, it does much more work delegating method calls to its external dependencies. I understand how to check for this in a Unit-Test with specific scenarios using dependency inversion and mock objects that focus on the external behavior of the method, but how does this work with DbC and post-conditions?
My second question has to deal with understanding complex post-conditions. It seems to me that to write out a post-condition for many functions, that you basically have to re-write the body of the function for your post-condition to know what the new state is going to be. What is the point of that?
I really do like the notion of DbC and I think that it has great promise, particularly if I can figure out how to reproduce some failure state once I find a validated contract. Over the past couple of hours I have been reading some neat stuff wrt. automatic test generation in Eiffel. I am currently trying to improve my processes in C++ development, but I am open to learning something new if I can figure out how to not lose all of the ground I have made in my current projects. Thanks.
but how does this work with DbC and
post-conditions?
Every function is basically one of these:
A sequence of statements
A conditional statement
A loop
The idea is that you should check any postconditions about the results of the function that go beyond the union of the postconditions of all the functions called.
that you basically have to re-write
the body of the function for your
post-condition to know what the new
state is going to be
Think about it the other way round. What made you write the function in the first place? What were you pursuing? Can that be expressed in a postcondition which is more simple than the function body itself? A postcondition will typically use queries (what in C++ are const functions), while the body usually combines commands and queries (methods that modify the object and methods which only get information from it).
In some cases, yes, you will find out that you can really add little value with postconditions. In these cases, writing a bunch of tests will typically be enough.
See also:
Bertrand Meyer, Contract Driven
Development
Related questions 1, 2
Delegation at the contract level
most of the time when you call a
method of a class, it does much more
work delegating method calls to its
external dependencies
As for this first question: the implementation of a function/method may call many other function/methods, but if the designer of the code had a clear mind, this does not imply that the specification of the caller is the concatenation of the specifications of the callees. For a method that calls many others, the size of the specification can remain contained if the method accomplishes a precise and well-defined task. Which it should if the whole system was well designed.
You are clearly asking your question from the point of view of run-time assertion checking. In this context, the above would perhaps be expressed as "you don't need to re-check in the post-condition of the caller that all the callees have respected their respective contracts. These checks will already be made on each call. In the post-condition of the caller, only check the functionally visible result of the caller."
Understanding complex post-conditions
You may find this "ACSL by example" document interesting (although probably different from what you're used to). It contains many examples of formal contracts for C functions. The language of the contracts is intended for static verification instead of run-time checking, with all the advantages and the drawbacks that it entails. They are a little more sophisticated than the "Bank Accounts" that you mention — these functions implement real algorithms, although simple ones. The document keeps the contracts short and readable by introducing well-thought-out auxiliary predicates (which would be called queries in Eiffel, as Daniel points out in his answer).

Doesn't Passing in Parameters that Should Be Known Implicitly Violate Encapsulation?

I often hear around here from test driven development people that having a function get large amounts of information implicitly is a bad thing. I can see were this would be bad from a testing perspective, but isn't it sometimes necessary from an encapsulation perspective? The following question comes to mind:
Is using Random and OrderBy a good shuffle algorithm?
Basically, someone wanted to create a function in C# to randomly shuffle an array. Several people told him that the random number generator should be passed in as a parameter. This seems like an egregious violation of encapsulation to me, even if it does make testing easier. Isn't the fact that an array shuffling algorithm requires any state at all other than the array it's shuffling an implementation detail that the caller should not have to care about? Wouldn't the correct place to get this information be implicitly, possibly from a thread-local singleton?
I don't think it breaks encapsulation. The only state in the array is the data itself - and "a source of randomness" is essentially a service. Why should an array naturally have an associated source of randomness? Why should that have to be a singleton? What about different situations which have different requirements - e.g. speed vs cryptographically secure randomness? There's a reason why java.util.Random has a SecureRandom subclass :) Perhaps it doesn't matter whether the shuffle's results are predictable with a lot of effort and observation - or perhaps it does. That will depend on the context, and that's information that the shuffle algorithm shouldn't care about.
Once you start thinking of it as a service, it makes sense that it's passed in as a dependency.
Yes, you could get it from a thread-local singleton (and indeed I'm going to blog about exactly that in the next few days) but I would generally code it so that the caller gets to make that decision.
One benefit of the "randomness as a service" concept is that it makes for repeatability - if you've got a test which fails, you can pass in a Random with a specific seed and know you'll always get the same results, which makes debugging easier.
Of course, there's always the option of making the Random optional - use a thread-local singleton as a default if the caller doesn't provide their own.
Yes, that does break encapsulation. As with most software design decisions, this is a trade-off between two opposing forces. If you encapsulate the RNG then you make it difficult to change for a unit test. If you make it a parameter then you make it easy for a user to change the RNG (and potentially get it wrong).
My personal preference is to make it easy to test, then provide a default implementation (a default constructor that creates its own RNG, in this particular case) and good documentation for the end user. Adding a method with the signature
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source)
that creates a Random using the current system time as its seed would take care of most normal use cases of this method. The original method
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
could be used for testing (pass in a Random object with a known seed) and also in those rare cases where a user decides they need a cryptographically secure RNG. The one-parameter implementation should call this method.
I don't think this violates encapsulation.
Your Example
I would say that being able to provide an RNG is a feature of the class. I would obviously provide a method that doesn't require it, but I can see times where it may be useful to be able to duplicate the randomization.
What if the array shuffler was part of a game that used the RNG for level generation. If a user wanted to save the level and play it again later, it may be more efficient to store the RNG seed.
General Case
Simple classes that have a single task like this typically don't need to worry about divulging their inner workings. What they encapsulate is the logic of the task, not the elements required by that logic.

How to design a class that has only one heavy duty work method and data returning other methods?

I want to design a class that will parse a string into tokens that are meaningful to my application.
How do I design it?
Provide a ctor that accepts a string, provide a Parse method and provide methods (let's call them "minor") that return individual tokens, count of tokens etc. OR
Provide a ctor that accepts nothing, provide a Parse method that accepts a string and minor methods as above. OR
Provide a ctor that accepts a string and provide only minor methods but no parse method. The parsing is done by the ctor.
1 and 2 have the disadvantage that the user may call minor methods without calling the Parse method. I'll have to check in every minor method that the Parse method was called.
The problem I see in 3 is that the parse method may potentially do a lot of things. It just doesn't seem right to put it in the ctor.
2 is convenient in that the user may parse any number of strings without instantiating the class again and again.
What's a good approach? What are some of the considerations?
(the language is c#, if someone cares).
Thanks
I would have a separate class with a Parse method that takes a string and converts it into a separate new object with a property for each value from the string.
ValueObject values = parsingClass.Parse(theString);
I think this is a really good question...
In general, I'd go with something that resembles option 3 above. Basically, think about your class and what it does; does it have any effective data other than the data to parse and the parsed tokens? If not, then I would generally say that if you don't have those things, then you don't really have an instance of your class; you have an incomplete instance of your class; something which you'd like to avoid.
One of the considerations that you point out is that the parsing of the tokens may be a relatively computationally complicated process; it may take a while. I agree with you that you may not want to take the hit for doing that in the constructor; in that case, it may make sense to use a Parse() method. The question that comes in, though, is whether or not there's any sensible operations that can be done on your class before the parse() method completes. If not, then you're back to the original point; before the parse() method is complete, you're effectively in an "incomplete instance" state of your class; that is, it's effectively useless. Of course, this all changes if you're willing and able to use some multithreading in your application; if you're willing to offload the computationally complicated operations onto another thread, and maintain some sort of synchronization on your class methods / accessors until you're done, then the whole parse() thing makes more sense, as you can choose to spawn that in a new thread entirely. You still run into issues of attempting to use your class before it's completely parsed everything, though.
I think an even more broad question that comes into this design, though, is what is the larger scope in which this code will be used? What is this code going to be used for, and by that, I mean, not just now, with the intended use, but is there a possibility that this code may need to grow or change as your application does? In terms of the stability of implementation, can you expect for this to be completely stable, or is it likely that something about the set of data you'll want to parse or the size of the data to parse or the tokens into which you will parse will change in the future? If the implementation has a possibility of changing, consider all the ways in which it may change; in my experience, those considerations can strongly lead to one or another implementation. And considering those things is not trivial; not by a long shot.
Lest you think this is just nitpicking, I would say, at a conservative estimate, about 10 - 15 percent of the classes that I've written have needed some level of refactoring even before the project was complete; rarely has a design that I've worked on survived implementation to come out the other side looking the same way that it did before. So considering the possible permutations of the implementation becomes very useful for determining what your implementation should be. If, say, your implementation will never possibly want to vary the size of the string to tokenize, you can make an assumption about the computatinal complexity, that may lead you one way or another on the overall design.
If the sole purpose of the class is to parse the input string into a group of properties, then I don't see any real downside in option 3. The parse operation may be expensive, but you have to do it at some point if you're going to use it.
You mention that option 2 is convenient because you can parse new values without reinstantiating the object, but if the parse operation is that expensive, I don't think that makes much difference. Compare the following code:
// Using option 3
ParsingClass myClass = new ParsingClass(inputString);
// Parse a new string.
myClass = new ParsingClass(anotherInputString);
// Using option 2
ParsingClass myClass = new ParsingClass();
myClass.Parse(inputString);
// Parse a new string.
myClass.Parse(anotherInputString);
There's not much difference in use, but with Option 2, you have to have all your minor methods and properties check to see if parsing had occurred before they can proceed. (Option 1 requires to you do everything that option 2 does internally, but also allows you to write Option 3-style code when using it.)
Alternatively, you could make the constructor private and the Parse method static, having the Parse method return an instance of the object.
// Option 4
ParsingClass myClass = ParsingClass.Parse(inputString);
// Parse a new string.
myClass = ParsingClass.Parse(anotherInputString);
Options 1 and 2 provide more flexibility, but require more code to implement. Options 3 and 4 are less flexible, but there's also less code to write. Basically, there is no one right answer to the question. It's really a matter of what fits with your existing code best.
Two important considerations:
1) Can the parsing fail?
If so, and if you put it in the constructor, then it has to throw an exception. The Parse method could return a value indicating success. So check how your colleagues feel about throwing exceptions in situations which aren't show-stopping: default is to assume they won't like it.
2) The constructor must get your object into a valid state.
If you don't mind "hasn't parsed anything yet" being a valid state of your objects, then the parse method is probably the way to go, and call the class SomethingParser.
If you don't want that, then parse in the constructor (or factory, as Garry suggests), and call the class ParsedSomething.
The difference is probably whether you are planning to pass these things as parameters into other methods. If so, then having a "not ready yet" state is a pain, because you either have to check for it in every callee and handle it gracefully, or else you have to write documentation like "the parameter must already have parsed a string". And then most likely check in every callee with an assert anyway.
You might be able to work it so that the initial state is the same as the state after parsing an empty string (or some other base value), thus avoiding the "not ready yet" problem.
Anyway, if these things are likely to be parameters, personally I'd say that they have to be "ready to go" as soon as they're constructed. If they're just going to be used locally, then you might give users a bit more flexibility if they can create them without doing the heavy lifting. The cost is requiring two lines of code instead of one, which makes your class slightly harder to use.
You could consider giving the thing two constructors and a Parse method: the string constructor is equivalent to calling the no-arg constructor, then calling Parse.