Whose responsibility is it to check data validity? - language-agnostic

I am confused as to whether it is the caller or the callee's responsibility to check for data legality.
Should the callee check whether passed-in arguments should not be null and meet some other requirements so that the callee method can execute normally and successfully, and to catch any potential exceptions? Or it is the caller's responsibility to do this?

Both consumer side(client) and provider side(API) validation.
Clients should do it because it means a better experience. For example, why do a network round trip just to be told that you've got one bad text field?
Providers should do it because they should never trust clients (e.g. XSS and man in the middle attacks). How do you know the request wasn't intercepted? Validate everything.
There are several levels of valid:
All required fields present, correct formats. This is what the client validates.
# 1 plus valid relationships between fields (e.g. if X is present then Y is required).
# 1 plus # 2 plus business valid: meets all business rules for proper processing.
Only the provider side can do #2 and #3.

For an API the callee should always do proper validation and throw a descriptive exception for invalid data.
For any client with IO overhead client should do basic validation as well...

Validation: Caller vs. Called
The TLDR version is both.
The long version involves who, why, when, how, and what.
Both
Both should be ready to answer the question "can this data be operated on reliably?" Do we know enough about this data to do something meaningful with it? Many will suggest that the reliability of the data should never be trusted, but that only leads to a chicken and egg problem. Chasing it endlessly from both ends will not provide for meaningful value, but to some degree it essential.
Both must validate the shape of the data to ensure base usability. If either one does not recognize or understand the shape of the data, there is no way to know how to further handle it with any reliability. Depending on the environment, the data may need to be a particular 'type', which is often an easy way to validate shape. We often consider types that present evidence of common linage back to a particular ancestor and retain the crucial traits to possess the right shape. Other characteristics might be important if the data is anything other than an in memory structure, for instance if it is a stream or some other resource external the running context.
Many languages include data shape checking as a built-in language feature through type or interface checking. However, when favoring composition over inheritance, providing a good mechanism to verify trait existence is incumbent on the implementer. One strategy to achieve this is through dynamic programming, or particularly via type introspection, inference, or reflection.
Called
The called must validate the domain (the set of inputs) of the given context to which it will operate on. The design of the called always suggests it can handle only so many cases of input. Usually these values are broken up into certain subclasses or categories of input. We verify the domain in the called because the called is intimate with the localized constraints. It knows better than anyone else what is good input, and what is not.
Normal values: These values of the domain map to a range. For every foo there is one and only one bar.
Out of range/out of scope values: These values are part of the general domain, but will not map to a range in the context of the called. No defined behavior exists for these values, and thus no valid output is possible. Frequently out-of-range checking entails range, limit, or allowed characters (or digits, or composite values). A cardinality check (multiplicity) and subsequently a presence check (null or empty), are special forms of a range checking.
Values that lead to Illogical or undefined behavior: These values are special values, or edge cases, that are otherwise normal, but because of the algorithm design and known environment constraints, would produce unexpected results. For instance, a function that operates on numbers should guard against division by zero or accumulators that would overflow, or unintended loss of precision. Sometimes the operating environment or compiler can warn that these situations may happen, but relying on the runtime or compiler is not good practice as it may not always be capable of deducing what is possible and what is not. This stage should be largely verification, through secondary validation, that the caller provided good, usable, meaningful input.
Caller
The caller is special. The caller has two situations in which it should validate data.
The first situation is on assignment or explicit state changes, where a change happens to at least one element of the data by some explicit mechanism, internally, or externally by something in its container. This is somewhat out of scope of the question, but something to keep in mind. The important thing is to consider the context when a state change occurs, and one or more elements that describe the state are affected.
Self/Referential Integrity: Consider using an internal mechanism to validate state if other actors can reference the data. When the data has no consistency checks, it is only safe to assume it is in an indeterminate state. That is not intermediate, but indeterminate. Know thyself. When you do not use a mechanism to validate internal consistency on state change, then the data is not reliable and that leads to problems in the second situation. Make sure the data for the caller is in a known, good state; alternatively, in a known transition/recovery state. Do not make the call until you are ready.
The second situation is when the data calls a function. A caller can expect only so much from the called. The caller must know and respect that the called recognizes only a certain domain. The caller also must be self-interested, as it may continue and persist long after the called completes. This means the caller must help the called be not only successful, but also appropriate for the task: bad data in produces bad data out. On the same token, even good data in and out with respect to the called may not be appropriate for the next thing in terms of the caller. The good data out may actually be bad data in for the caller. The output of the called may invalidate the caller for the caller's current state.
Ok, so enough commentary, what should a caller validate specifically?
Logical and normal: given the data, is the called a good strategy that fits the purpose and intent? If we know it will fail with certain values, there is no point in performing the call without the appropriate guards most times. If we know the called cannot handle zero, do not ask it to as it will never succeed. What is more expensive and harder to manage: a [redundant (do we know?)] guard clause, or an exception [that occurs late in a possibly long running, externally available resource dependent process]? Implementations can change, and change suddenly. Providing the protection in the caller reduces the impact and risk in changing that implementation.
Return values: check for unsuccessful completion. This is something that a caller may or may not need to do. Before using or relying upon the returned data, check for alternative outcomes, if the system design incorporates success and failure values that may accompany the actual return value.
Footnote: In case it wasn't clear. Null is a domain issue. It may or may not be logical and normal, so it depends. If null is a natural input to a function, and the function could be reasonably expected to produce meaningful output, then leave it to the caller to use it. If the domain of the caller is such that null is not logical, then guard against it in both places.
An important question: if you are passing null to the called, and the called is producing something, isn't that a hidden creational pattern, creating something from nothing?

It's all about "contract". That's a callee that decides which parameters are fine or not.
You may put in documentation that a "null" parameter is invalid and then throwing NullPointerException or InvalidArgumentException is fine.
If returning a result for null parameter make sense - state it in the documentation. Ususally such situation is a bad design - create an overriden method with fewer parameters instead of accepting null.
Only remember about throwing descriptive exceptions. By a rule of thumb:
If the caller passed wrong arguments, different than described in documentation (i.e. null, id < 0 etc) - throw an unchecked exception (NullPointerException or InvalidArgumentException)
If the caller passed correct arguments but there may be an expected business case that makes it impossible to process the call - you may want to throw a checked descriptive exception. For example - for getPermissionsForUser(Integer userId) the caller passes userId not knowing if such user exists but it's a non-null Integer. Your method may return a list of permissions or thorw a UserNotFoundException. It may be a checked exception.
If the parameters are correct according to the documentation but they causes processing internal error - you may throw an unchecked exception. This usually means that your method is not well tested ;-)

Depends on whether you program nominally, defensively, or totally.
If you program defensively (my personal favourite for most Java methods), you validate input in the method. You throw an exception (or fail in another way) when validation fails.
If you program nominally, you don't validate input (but expect the client to make sure the input is valid). This method is useful when validation would aversely impact performance, because the validation would take a lot of time (like a time-consuming search).
If you program totally (my personal favourite for most Objective-C methods), you validate input in the method, but you change invalid input into valid input (like by snapping values to the nearest valid value).
In most cases you would program defensively (fail-fast) or totally (fail-safe). Nominal programming is risky IMO and should be avoided when expecting input from an external source.
Of course, don't forget to document everything (especially when programming nominally).

Well... it depends.
If you can be sure how to handle invalid data inside your callee then do it there.
If you are not sure (e.g. because your method is quite general and used in a few different places and ways) then let the caller decide.
For example imagine a DAO Method that has to retrieve a certain entity and you don't find it. Can you decide whether to throw an exception, maybe roll back a transaction or just consider it okay?
In cases like this it is definitely up to the caller to decide how to handle it.

Both. This is a matter of good software development on both sides and independent of environment (C/S, web, internal API) and language.
The callee should be validating all parameters against the well documented parameter list (you did document it, right?). Depending on the environment and architecture, good error messages or exceptions should be implemented to give clear indication of what is wrong with the parameters.
The caller should be ensuring that only appropriate parameter values are passed in the api call. Any invalid values should be caught as soon as possible and somehow reflected to the user.
As often occurs in life, neither side should just assume that the other guy will do the right thing and ignore the potential problem.

I'm going to take a different perspective on the question. Working inside a contained application, both caller and callee are in the same code. Then any validation that is required by the contract of the callee should be done by the callee.
So you've written a function and your contract says, "Does not accept NULL values." you should check that NULL values have not been sent and raise an error. This ensures that your code is correct, and if someone else's code is doing something it shouldn't they'll know about it sooner.
Furthermore, if you assume that other code will call your method correctly, and they don't, it will make tracking the source of potential bugs more difficult.
This is essential for "Fail Early, Fail Often" where the idea is to raise an error condition as soon as a problem is detected.

It is callee responsibility to validate data. This is because only callee knows what is valid. Also this is a good security practice.

It needs to be on both end in client side and server(callee and caller) side too.
Client :
This is most effective one.
Client validation will Reduce one request to server.
To reduce the bandwidth traffic.
Time comsuming (if it has delay responase from server)
Server :
Not to believe on UI data (due to hackers).
Mostly backend code will be reused, so we dont know whether the data will be null,etc,. so we need to validate on both callee and caler methods.
Overall,
1. If data comes from UI, Its always better to validate in UI layer and make an double check in server layer.
2. If data transfer with in server layer itself, we need to validate on callee and for double check, we requre to do on caller side also.
Thanks

In my humble opinion, and in a few more words explaining why, it is the callee's responsibility most of the time, but that doesn't mean the caller is always scot-free.
The reason why is that the callee is in the best position to know what it needs to do its work, because it's the one doing the work. It's thus good encapsulation for the object or method to be self-validating. If the callee can do no work on a null pointer, that's an invalid argument and should be thrown back out as such. If there are arguments out of range, that's easy to guard against as well.
However, "ignorance of the law is no defense". It's not a good pattern for a caller to simply shove everything it's given into its helper function and let the callee sort it out. The caller adds no value when it does this, for one thing, especially if what the caller shoves into the callee is data it was itself given by its own caller, meaning this layer of the call stack is likely redundant. It also makes both the caller's and callee's code very complex, as both sides "defend" against unwanted behavior by the other (the callee trying to salvage something workable and testing everything, and the caller wrapping the call in try-catch statements that attempt to correct the call).
The caller should therefore validate what it can know about the requirements for passed data. This is especially true when there is a time overhead inherent in making the call, such as when invoking a service proxy. If you have to wait a significant portion of a second to find out your parameters are wrong, when it would take a few ticks to do the same client-side, the advantage is obvious. The callee's guard clauses are exactly that; the last line of defense and graceful failure before something ugly gets thrown out of the actual work routine.

There should be something between caller and callee that is called a contract. The callee ensures that it does the right thing if the input data is in specified values. He still should check if the incomming data is right according to those specifications. In Java you could throw an InvalidArgumentException.
The caller should also work within the contract specifications. If he should check the data he hands over depends on the case. Ideally you should program the caller in a way that checking is unescessary because you are sure of the validity your data. If it is e.g. user input you cannot be sure that it is valid. In this case you should check it. If you don't check it you at least have to handle the exceptions and react accordingly.

The callee has the responsibility of checking that the data it receives is valid. Failure to perform this task will almost certainly result in unreliable software and exposes you to potential security issues.
Having said that if you have control of the client (caller) code then you should also perform at least some validation there as well since it will result in a better over all experience.
As a general rule try to catch problems with data as early as possible, it results in far less trouble further down the line.

Related

When is it right time to throw an exception in functional programming

Say I have a web application with UserController. Client sends a HTTP POST request that is about to be handled by the controller. That however first must parse the provided json to UserDTO. For this reason there exist a UserDTOConverter with a method toDTO(json): User.
Given I value functional programming practices for its benefits of referential transparency and pure function the question is. What is the best approach to deal with a possibly inparsable json? First option would be to throw an exception and have it handled in global error handler. Invalid json means that something went terrible wrong (eg hacker) and this error is unrecoverable, hence the exception is on point (even assuming FP). The second option would be to return Maybe<User> instead of User. Then in the controller we can based on the return type return HTTP success response or failure response. Ultimately both approaches results in the same failure/success response, which one is preferable though?
Another example. Say I have a web application that needs to retrieve some data from remote repository UserRepository. From a UserController the repository is called getUser(userId): User. Again, what is the best way to handle the error of possible non existent user under provided id? Instead of returning User I can again return Maybe<User>. Then in controller this result can be handled by eg returning "204 No Content". Or I could throw an exception. The code stays referentially transparent as again I am letting the exception to bubble all the way up to global error handler (no try catch blocks).
Whereas in the first example I would lean more towards throwing an exception in the latter one I would prefer returning a Maybe. Exceptions result in cleaner code as the codebase is not cluttered with ubiquitous Eithers, Maybes, empty collections, etc. However, returning these kinds of data structure ensure explicitness of the calls, and imo results in better discoverability of the error.
Is there a place for exceptions in functional programming? What is the biggest pitfall of using exceptions over returning Maybes or Eithers? Does it make sense to be throwing exceptions in FP based app? If so is there a rule of thumb for that?
TL;DR
If there are Maybes/Eithers all over the codebase, you generally have a problem with I/O being mixed promiscuously with business logic. This doesn't get any better if you replace them with exceptions (or vice versa).
Mark Seemann has already given a good answer, but I'd like to address one specific bit:
Exceptions result in cleaner code as the codebase is not cluttered with ubiquitous Eithers, Maybes, empty collections, etc.
This isn't necessarily true. Either part.
Problem with Exceptions
The problem with exceptions is that they circumvent the normal control flow, which can make the code difficult to reason about. This seems so obvious as to barely be worthy of mention, until you end up with an error thrown 20 calls deep in a call stack where it isn't clear what triggered the error in the first place: even though the stack trace might point you to the exact line in the code you might have a very hard time figuring out the application state that caused the error to happen. The fact that you can be undisciplined about state transitions in an imperative/procedural program is of course the whole thing that FP is trying to fix.
Maybe, Maybe not: It might be Either one
You shouldn't have ubiquitous Maybes/Eithers all over the codebase, and for the same reason that you shouldn't be throwing exceptions willy-nilly all over the codebase: it complicates the code too much. You should have files that are entry points to the system, and those I/O-concerned files will be full of Maybes/Eithers, but they should then delegate to normal functions that either get lifted or dispatched to through some other mechanism depending on language (you don't specify the language). At the very least languages with option types almost always support first-class functions, you can always use a callback.
It's kind of like testability as a proxy for code quality: if your code is hard to test it probably has structural problems. If your codebase is full of Maybes/Eithers in every file it probably has structural problems.
You're asking about a couple of different scenarios, and I'll try to address each one.
Input
The first question pertains to converting a UserDTO (or, in general, any input) into a stronger representation (User). Such a conversion is usually self-contained (has no external dependencies) so can be implemented as a pure function. The best way to view such a function is as a parser.
Usually, parsers will return Either values (AKA Result), such as Either<Error, User>. The Either monad is, however, short-circuiting, meaning that if there's more than one problem with the input, only the first problem will be reported as an error.
When validating input, you often want to collect and return a list of all problems, so that the client can fix all problems and try again. A monad can't do that, but an applicative functor can. In general, I believe that validation is a solved problem.
Thus, you'll need to model validation as a type that isomomorphic to Either, but has different applicative functor behaviour, and no monad interface. The above links already show some examples, but here's a realistic C# example: An applicative reservation validation example in C#.
Data access
Data access is different, because you'd expect the data to already be valid. Reading from a data store can, however, 'go wrong' for two different reasons:
The data is not there
The data store is unreachable
The first issue (querying for missing data) can happen for various reasons, and it's usually appropriate to plan for that. Thus, a database query for a user should return Maybe<User>, indicating to the client that it should be ready to handle both cases: the user is there, or the user is not there.
The other issue is that the data store may sometimes be unreachable. This can be caused by a network partition, or if the database server is experiencing problems. In such cases, there's usually not much client code can do about it, so I usually don't bother explicitly modelling those scenarios. In other words, I'd let the implementation throw an exception, and the client code would typically not catch it (other than to log it).
In short, only throw exceptions that are unlikely to be handled. Use sum types for expected errors.
I have seen code like below, where exceptions are wrapped in a generic error. What i don't like about this approach is that we need to write a handler to deal with this UnexpectedError, inspect it, extract the exception and log it. Not sure if this is the correct way to do it.
override suspend fun update(
reservation: Reservation,
history: ReservationHistory
): Either<ReservationError, Reservation> {
return Either.catch {
mongoClient.startSession().use { clientSession ->
clientSession.startTransaction()
mongoClient.getDatabase(database)
.getCollection<ReservationDocument>()
.updateOneById(reservation.reservationId.value, MapToReservationDocument.invoke(reservation))
mongoClient.getDatabase(database)
.getCollection<ReservationHistoryDocument>()
.insertOne(MapToReservationHistoryDocument.invoke(history))
reservation
}
}.mapLeft {
UnexpectedError(it)
}
}

Good/Bad business object design?

I have inherited a suite of business objects, which are working rather well. It looks as though it's based on the CSLA framework by Rockford Lhotka, but there is one very annoying issue. When the business object does a load, it throws an exception. So, if it tries to load some data that isn't available in the database, you get an exception thrown. Is this good design?
I've been having a debate with a co-worker about this very topic lately.
It's my assertion that a situation where a method you expect to do X does not do X is the very definition of an exception.
What you chose to do with that exception is another story. You can choose to handle it internally in your code or you can choose to defer the handling of that exception to a higher level in your code.
I will agree that it is always better to handle an exception if it at all makes sense to do so when and where it occurs rather than deferring it to a higher level of code.
That said I also believe that in lower levels of code it can make some sense to defer the handling of these exceptions to the higher levels of code. This provides the higher level of code more flexibility in how it wants to handle these situations and it also gives that higher level of code insight into what occured.
If for example your retrieving data from a database and building an object for use in your application you could conceivably have several things happen:
Return an Object as expected.
Be unable to connect to the database.
Not find the data you expect to find.
You could handle the exceptions 2 and 3 by simply returning an empty object or null but then the higher level code doesn't know why the information it requested was not returned. This would require some secondary pattern in which to notify the higher level of what occurred.
Alternatively I assert that you could create a custom exception with a message field in which to pass back the exception event which occurred. Forcing your higher level code to handle those situations as it sees appropriate.
In my opinion the later can be more flexible but does require the higher level code to know it needs to handle the exceptions which should be well documented so that its clear a method can throw certain exceptions.
Note: I am not an expert, I do not claim to be, I am sharing my opinion after encouragement from my peer whom I have been debating this topic with.
IMHO, Exceptions are for exceptional cases -- missing data is usually not exceptional, unless its on a primary key, or other non-null field.
It depends on the impact to the application of the data being missing. If the application cannot reasonably continue without the data, then it is an exceptional case and an exception is warranted. If it is relatively normal for the data to be missing (and especially if the caller is expected to handle the exception and carry on), then that is a use of exceptions that is a bad design.

Do these rules fully define when to throw exceptions?

If a method fails to do what it says it will do (e.g. if SendEmail fails to send an email), it should throw an exception. I blogged about this in-depth (optional read): Do your work, or throw an exception - example. It seems this idea is already a convention, even though I have not found it described elsewhere. "Exceptions are for exceptional cases" is a different guideline requiring an additional guideline "Do not use exceptions for normal flow control", and requiring discussion on what is "exceptional". "Do your work or throw an exception" is as unambiguous as the responsibility of the method (which already should be unambiguous).
All methods, including "Try" methods (e.g. TrySendEmail), should always throw if some unrecoverable error happens--something that adversely affects other features of the app, not just the functionality being attempted, such as RamChipExplodedException. That is, even though "TrySendEmail" technically accomplishes what the name indicates (it tries to send an email), it should still throw an exception if the RAM explodes (if it can even do that... you get my drift).
"Try" methods should be used only in special cases.
Are these always good guidelines (i.e., are there any exceptions (pun!) to the rule)? Can you think of any others not covered by these?
To clarify, I'm not asking for more specific rules of thumb to help follow these guidelines; I'm looking for additional guidelines.
For me, the #1 rule is: avoid exceptions in normal flow. When you debug, make sure all thrown exceptions are signalled. It is the first indication that your code is working the way you designed it.
TryXXX should not throw exceptions unless you violate the design contract. So for instance TrySendMail() may throw if you pass in a null pointer as the text for the email.
You need TryXXX methods if there is no way to get "normal flow" exception free without them. So for parsing data, they're essential, I would not reccomend a TryLogin function.
"Do your work or throw" is a good starting point, but I would not make it a dogma. I am OK with a collection returning -1 if you ask IndexOf an item that isn't there. You can discuss with your colleages where you want to draw the line and put in your coding standards.
I also use exceptions a lot to label control paths that I think are not used. For instance if I am forced to write a GetHashCode function that I assume won't be called, I make it throw.
The fundamental goal of efficient and effective exception handling strategy is for a method to throw exceptions for situations the caller isn't prepared to handle, while letting the caller handle things it is prepared for without the overhead of throwing an exception.
Since functions cannot be clairvoyant with regard to caller expectations, the usual pattern is to have multiple versions of routines where different callers may have different expectations. Someone who decides to call Dictionary.GetValue indicates an expectation that the passed-in key exists in the dictionary. By contrast, someone who calls Dictionary.TryGetValue conveys an expectation that the key may or may not be in a dictionary--either would be reasonably expected. Even in that case, however, the caller probably isn't expecting the CPU to catch fire. If a CpuCaughtFireException occurs in the processing of TryGetValue, it should probably be allowed to propagate up the call stack.
In my opinion, you shouldn't get too hung up about the name "exception" and what exactly is and is not an exceptional situation.
Exceptions are simply another flow control statement, one that allows the flow of control to shortcut up the call stack.
If that's what you need (i.e. you expect immediate calling methods not to be able to handle the condition usefully), use an exception. Otherwise, don't.

Strategy for handling parameter validation in class library

I got a rather big class library that contains a lot of code.
I am looking at how to optimize the performance of some of the code, and for some rather simple utility methods I've found that the parameter validation occupies a rather large portion of the runtime for some core methods.
Let me give a typical example:
A.MethodA1 runs a loop, iterating over a collection, calling B.MethodB1 for each element
B.MethodB1 processes the element and returns the result, it's a rather basic calculation, but since it is used many places, it has been put into its own method instead of being copied and pasted where needed
A.MethodA1 calls C.MethodC1 with the results of B.MethodB1, and puts the result into a list that is returned at the end of the loop
In the case I've found now, B.MethodB1 does rudimentary parameter validation. Since the method calls other internal methods, I'd like to avoid having NullReferenceExceptions several layers deep into the code, and rather fail early, hence B.MethodB1 validates the parameters, like checking for null and some basic range checks on another parameter.
However, in this particular call scenario, it is impossible (due to other program logic) for these parameters to ever have the wrong values. If they had, from the program standpoint, B.MethodB1 would never be called at all for those values, A.MethodA1 would fail before the call to B.MethodB1.
So I was considering removing the parameter validation in B.MethodB1, since it occupies roughly 65% of the method runtime (and this is part of some heavily used code.)
However, B.MethodB1 is a public method, and can thus be called from the program, in which case I want the parameter validation.
So how would you solve this dilemma?
Keep the parameter validation, and take the performance hit
Remove the parameter validation, and have potentially fail-late problems in the method
Split the method into two, one internal that doesn't have parameter validation, called by the "safe" path, and one public that has the parameter validation + a call to the internal version.
The latter one would give me the benefits of having no parameter validation, while still exposing a public entrypoint which does have parameter validation, but for some reason it doesn't sit right with me.
Opinions?
I would go with option 3. I tend to use assertions for private and internal methods and do all the validation in public methods.
By the way, is the performance hit really that big?
That's an interesting question.
Hmmm, makes me think ... "code contracts" .. It would seem like it might be technically possible to statically (at compile time) have certain code contracts be proven to be fulfilled. If this were the case and you had such a compilation validation option you could state these contracts without ever having to validate the conditions at runtime.
It would require that the client code itself be validated against the code contacts.
And, of course it would inevitably be highly dependent on the type of conditions you'd want to write, and it would probably only be feasible to prove these contracts to a certain point (how far up the possible call graph would you go?). Beyond this point the validator might have to beg off, and insist that you place a runtime check (or maybe a validation warning suppression?).
All just idle speculation. Does make me wonder a bit more about C# 4.0 code contracts. I wonder if these have support for static analysis. Have you checked them out? I've been meaning to, but learning F# is having to take priority at the moment!
Update:
Having read up a little on it, it appears that C# 4.0 does indeed have a 'static checker' as well as a binary rewriter (which takes care of altering the output binary so that pre and post condition checks are in the appropriate location)
What's not clear from my extremely quick read, is whether you can opt out of the binary rewriting - what I'm thinking here is that what you'd really be looking for is to use the code contracts, have the metadata (or code) for the contracts maintained within the various assemblies but use only the static checker for at least a selected subset of contracts, so that you in theory get proven safety without any runtime hit.
Here's a link to an article on the code contracts

Why are Exceptions said to be so bad for Input Validation?

I understand that "Exceptions are for exceptional cases" [a], but besides just being repeated over and over again, I've never found an actual reason for this fact.
Being that they halt execution, it makes sense that you wouldn't want them for plain conditional logic, but why not input validation?
Say you were to loop through a group of inputs and catch each exception to group them together for user notification... I continually see that this is somehow "wrong" because users enter incorrect input all the time, but that point seems to be based on semantics.
The input is Not what was expected and hence is exceptional. Throwing an exception allows me to define exactly what was wrong like StringValueTooLong or or IntegerValueTooLow or InvalidDateValue or whatever. Why is this considered wrong?
Alternatives to throwing an exception would be to either return (and eventually collect) an error code or far worse an error string. Then I would either show those error strings directly, or parse the error codes and then show corresponding error messages to the user. Wouldn't a exception be considered a malleable error code? Why create a separate table of error codes and messages, when these could be generalized with the exception functionality already built into my language?
Also, I found this article by Martin Fowler as to how to handle such things - the Notification pattern. I'm not sure how I see this as being anything other than Exceptions that don't halt execution.
a: Everywhere I've read anything about Exceptions.
--- Edit ---
Many great points have been made. I've commented on most and +'d the good points, but I'm not yet completely convinced.
I don't mean to advocate Exceptions as the proper means to resolve Input Validation, but I would like to find good reasons why the practice is considered so evil when it seems most alternate solutions are just Exceptions in disguise.
Reading these answers, I find it very unhelpful to say, "Exceptions should only be used for exceptional conditions". This begs the whole question of what is an "exceptional condition". This is a subjective term, the best definition of which is "any condition that your normal logic flow doesn't deal with". In other words, an exceptional condition is any condition you deal with using exceptions.
I'm fine with that as a definition, I don't know that we'll get any closer than that anyway. But you should know that that's the definition you are using.
If you are going to argue against exceptions in a certain case, you have to explain how to divide the universe of conditions into "exceptional" and "non-exceptional".
In some ways, it's similar to answering the question, "where are the boundaries between procedures?" The answer is, "Wherever you put the begin and end", and then we can talk about rules of thumb and different styles for determining where to put them. There are no hard and fast rules.
A user entering 'bad' input is not an exception: it's to be expected.
Exceptions should not be used for normal control flow.
In the past many authors have said that Exceptions are inherently expensive. Jon Skeet has blogged contrary to this (and mentioned a few time in answers here on SO), saying that they are not as expensive as reported (although I wouldn’t advocate using them in a tight loop!)
The biggest reason to use them is ‘statement of intent’ i.e. if you see an exception handling block you immediately see the exceptional cases which are dealt with outside of normal flow.
There is one important other reason than the ones mentioned already:
If you use exceptions only for exceptional cases you can run in your debugger with the debugger setting "stop when exception is thrown". This is extremely convenient because you drop into the debugger on the exact line that is causing the problem. Using this feature saves you a fair amount of time every day.
In C# this is possible (and I recommend it wholeheartedly), especially after they added the TryParse methods to all the number classes. In general, none of the standard libraries require or use "bad" exception handling. When I approach a C# codebase that has not been written to this standard, I always end up converting it to exception-free-for-regular cases, because the stop-om-throw is so valuable.
In the firebug javascript debugger you can also do this, provided that your libraries don't use exceptions badly.
When I program Java, this is not really possible because so many things uses exceptions for non-exceptional cases, including a lot of the standard java libraries. So this time-saving feature is not really available for use in java. I believe this is due to checked exceptions, but I won't start ranting about how they are evil.
Errors and Exceptions – What, When and Where?
Exceptions are intended to report errors, thereby making code more robust. To understand when to use exceptions, one must first understand what errors are and what is not an error.
A function is a unit of work, and failures should be viewed as errors or otherwise based on their impact on functions. Within a function f, a failure is an error if and only if it prevents f from meeting any of its callee’s preconditions, achieving any of f’s own postconditions, or reestablishing any invariant that f shares responsibility for maintaining.
There are three kinds of errors:
a condition that prevents the function from meeting a precondition (e.g., a parameter restriction) of another function that must be called;
a condition that prevents the function from establishing one of its own postconditions (e.g., producing a valid return value is a postcondition); and
a condition that prevents the function from re-establishing an invariant that it is responsible for maintaining. This is a special kind of postcondition that applies particularly to member functions. An essential postcondition of every non-private member function is that it must re-establish its class’s invariants.
Any other condition is not an error and should not be reported as an error.
Why are Exceptions said to be so bad for Input Validation?
I guess it is because of a somewhat ambiguous understanding of “input” as either meaning input of a function or value of a field, where the latter should’t throw an exception unless it is part of a failing function.
Maintainability - Exceptions create
odd code paths, not unlike GOTOs.
Ease of Use (for other classes) -
Other classes can trust that
exceptions raised from your user
input class are actual errors
Performance - In most languages, an
exception incurs a performance and
memory usage penalty.
Semantics - The meaning of words
does matter. Bad input is not
"exceptional".
I think the difference depends on the contract of the particular class, i.e.
For code that is meant to deal with user input, and program defensively for it (i.e. sanitise it) it would be wrong to throw an exception for invalid input - it is expected.
For code that is meant to deal with already sanitised and validated input, which may have originated with the user, throwing an exception would be valid if you found some input that is meant to be forbidden. The calling code is violating the contract in that case, and it indicates a bug in the sanitising and/or calling code.
When using exceptions, the error handling code is separated from the code causing the error. This is the intent of exception handling - being an exceptional condition, the error can not be handled locally, so an exception is thrown to some higher (and unknown) scope. If not handled, the application will exit before any more hard is done.
If you ever, ever, ever throw exception when you are doing simple logic operations, like verifying user input, you are doing something very, very very, wrong.
The input is Not what was expected and
hence is exceptional.
This statement does not sit well with me at all. Either the UI constrains user input (eg, the use of a slider that bounds min/max values) and you can now assert certain conditions - no error handling required. Or, the user can enter rubbish and you expect this to happen and must handle it. One or the other - there is nothing exception going here whatsoever.
Throwing an exception allows me to
define exactly what was wrong like
StringValueTooLong or or
IntegerValueTooLow or InvalidDateValue
or whatever. Why is this considered
wrong?
I consider this beyond - closer to evil. You can define an abstract ErrorProvider interface, or return a complex object representing the error rather than a simple code. There are many, many options on how you retrieve error reports. Using exceptions because the are convenient is so, so wrong. I feel dirty just writing this paragraph.
Think of throwing an exception as hope. A last chance. A prayer. Validating user input should not lead to any of these conditions.
Is it possible that some of the disagreement is due to a lack of consensus about what 'user input' means? And indeed, at what layer you're coding.
If you're coding a GUI user interface, or a Web form handler, you might well expect invalid input, since it's come direct from the typing fingers of a human being.
If you're coding the model part of an MVC app, you may have engineered things so that the controller has sanitised inputs for you. Invalid input getting as far as the Model would indeed be an exception, and may be treated as such.
If you're coding a server at the protocol level, you might reasonably expect the client to be checking user input. Again, invalid input here would indeed be an exception. This is quite different from trusting the client 100% (that would be very stupid indeed) - but unlike direct user input, you predict that most of the time inputs would be OK. The lines blur here somewhat. The more likely it is that something happens, the less you want to use exceptions to handle it.
This is a linguistic pov( point of view) on the matter.
Why are Exceptions said to be so bad for Input Validation?
conclusion :
Exceptions are not defined clearly enough, so there are different opinions.
Wrong input is seen as a normal thing, not as an exception.
thoughts ?
It probably comes down to the expectations one takes about the code that is created.
the client can not be trusted
validation has to happen at the server's side.
stronger : every validation happens at server's side.
because validation happens at the server's side it is expected to be done there and what is expected is not an exception, since it is expected.
However,
the client's input can not to be trusted
the client's input-validation can be trusted
if validation is trusted it can be expected to produce valid input
now every input is expected to be valid
invalid input is now unexpected, an exception
.
exceptions can be a nice way to exit the code.
A thing mentioned to consider is if your code is left in a proper state.
I would not know what would leave my code in an improper state.
Connections get closed automatically, leftover variables are garbage-collected, what's the problem?
Another vote against exception handling for things that aren't exceptions!
In .NET the JIT compiler won't perform optimizations in certain cases even when exceptions aren't thrown. The following articles explain it well.
http://msmvps.com/blogs/peterritchie/archive/2007/06/22/performance-implications-of-try-catch-finally.aspx
http://msmvps.com/blogs/peterritchie/archive/2007/07/12/performance-implications-of-try-catch-finally-part-two.aspx
When an exception gets thrown it generates a whole bunch of information for the stack trace which may not be needed if you were actually "expecting" the exception as is often the case when converting strings to int's etc...
In general, libraries throw exceptions and clients catch them and do something intelligent with them. For user input I just write validation functions instead of throwing exceptions. Exceptions seem excessive for something like that.
There are performance issues with exceptions, but in GUI code you won't generally have to worry about them. So what if a validation takes an extra 100 ms to run? The user isn't going to notice that.
In some ways it's a tough call - On the one hand, you might not want to have your entire application come crashing down because the user entered an extra digit in a zip code text box and you forgot to handle the exception. On the other, a 'fail early, fail hard' approach ensures that bugs get discovered and fixed quickly and keeps your precious database sane. In general I think most frameworks recommend that you don't use exception handling for UI error checking and some, like .NET Windows Forms, provide nice ways to do this (ErrorProviders and Validation events) without exceptions.
Exceptions should not be used for input validation, because not only should exceptions be used in exceptional circumstances (which as it has been pointed out incorrect user entry is not) but they create exceptional code (not in the brilliant sense).
The problem with exceptions in most languages is they change the rules of program flow, this is fine in a truly exceptional circumstance where it is not necessarily possible to figure our what the valid flow should be and therefore just throw an exception and get out however where you know what the flow should be you should create that flow (in the case listed it would be to raise a message to the user telling them they need to reenter some information).
Exceptions were truly overused in an application I work on daily and even for the case where a user entered an incorrect password when logging in, which by your logic would be an exception result because it is not what the application wants. However when a process has one of two outcomes either correct or incorrect, I dont think we can say that, incorrect, no matter how wrong, is exceptional.
One of the major problems I have found with working with this code is trying to follow the logic of the code without getting deeply involved with the debugger. Although debuggers are great, it should be possible to add logic to what happens when a user enters an incorrect password without having to fire one up.
Keep exceptions for truly exceptional execution not just wrong. In the case I was highlighting getting your password wrong is not exceptional, but not being able to contact the domain server may be!
When I see exceptions being thrown for validation errors I often see that the method throwing the exception is performing lots of validations all at once. e.g.
public bool isValidDate(string date)
{
bool retVal = true;
//check for 4 digit year
throw new FourDigitYearRequiredException();
retVal = false;
//check for leap years
throw new NoFeb29InANonLeapYearException();
retVal = false;
return retVal;
}
This code tends to be pretty fragile and hard to maintain as the rules pile up over the months and years. I usually prefer to break up my validations into smaller methods that return bools. It makes it easier to tweak the rules.
public bool isValidDate(string date)
{
bool retVal = false;
retVal = doesDateContainAFourDigitYear(date);
retVal = isDateInALeapYear(date);
return retVal;
}
public bool isDateInALeapYear(string date){}
public bool doesDateContainAFourDigitYear(string date){}
As has been mentioned already, returning an error struct/object containing information about the error is a great idea. The most obvious advantage being that you can collect them up and display all of the error messages to the user at once instead of making them play Whack-A-Mole with the validation.
i used a combination of both a solution:
for each validation function, i pass a record that i fill with the validation status (an error code).
at the end of the function, if a validation error exists, i throw an exception, this way i do not throw an exception for each field, but only once. i also took advantage that throwing an exception will stop execution because i do not want the execution to continue when data is invalid.
for example
procedure Validate(var R:TValidationRecord);
begin
if Field1 is not valid then
begin
R.Field1ErrorCode=SomeErrorCode;
ErrorFlag := True;
end;
if Field2 is not valid then
begin
R.Field2ErrorCode=SomeErrorCode;
ErrorFlag := True;
end;
if Field3 is not valid then
begin
R.Field3ErrorCode=SomeErrorCode;
ErrorFlag := True;
end;
if ErrorFlag then
ThrowException
end;
if relying on boolean only, the developer using my function should take this into account writing:
if not Validate() then
DoNotContinue();
but he may forgot and only call Validate() (i know that he should not, but maybe he might).
so, in the code above i gained the two advantages:
1-only one exception in the validation function.
2-exception, even uncaught, will stop the execution, and appear at test time.
8 years later, and I'm running into the same dilemma trying to apply the CQS pattern. I'm on the side that input validation can throw an exception, but with an added constraint. If any input fails, you need to throw ONE type of exception: ValidationException, BrokenRuleException, etc. Don't throw a bunch of different types as it'll be impossible to handle them all. This way, you get a list of all the broken rules in one place. You create a single class that is responsible for doing validation (SRP) and throw an exception if at least 1 rule is broken. That way, you handle one situation with one catch and you know you are good. You can handle that scenario no matter what code is called. This leaves all the code downstream much cleaner as you know it is in a valid state or it wouldn't have gotten there.
To me, getting invalid data from a user is not something you would normally expect. (If every user sends invalid data to you the first time, I'd take a second look at your UI.) Any data that prevents you from processing the true intent whether it is user or sourced elsewhere needs to abort processing. How is it any different than throwing an ArgumentNullException from a single piece of data if it was user input vs. it being a field on a class that says This is required.
Sure, you could do validation first and write that same boilerplate code on every single "command", but I think that is a maintenance nightmare than catching invalid user input all in one place at the top that gets handled the same way regardless. (Less code!) The performance hit will only come if the user gives invalid data, which should not happen that often (or you have bad UI). Any and all rules on the client side have to be re-written on the server, anyway, so you could just write them once, do an AJAX call, and the < 500 ms delay will save you a ton of coding time (only 1 place to put all your validation logic).
Also, while you can do some neat validation with ASP.NET out of the box, if you want to re-use your validation logic in other UIs, you can't since it is baked into ASP.NET. You'd be better off creating something below and handling it above regardless of the UI being used. (My 2 cents, at least.)
I agree with Mitch that that "Exceptions should not be used for normal control flow". I just want to add that from what I remember from my computer science classes, catching exceptions is expensive. I've never really tried to do benchmarks, but it would be interesting to compare performance between say, an if/else vs try/catch.
One problem with using exceptions is a tendency to detect only one problem at a time. The user fixes that and resubmits, only to find another problem! An interface that returns a list of issues that need resolving is much friendlier (though it could be wrapped in an exception).