Related
Say I have a web application with UserController. Client sends a HTTP POST request that is about to be handled by the controller. That however first must parse the provided json to UserDTO. For this reason there exist a UserDTOConverter with a method toDTO(json): User.
Given I value functional programming practices for its benefits of referential transparency and pure function the question is. What is the best approach to deal with a possibly inparsable json? First option would be to throw an exception and have it handled in global error handler. Invalid json means that something went terrible wrong (eg hacker) and this error is unrecoverable, hence the exception is on point (even assuming FP). The second option would be to return Maybe<User> instead of User. Then in the controller we can based on the return type return HTTP success response or failure response. Ultimately both approaches results in the same failure/success response, which one is preferable though?
Another example. Say I have a web application that needs to retrieve some data from remote repository UserRepository. From a UserController the repository is called getUser(userId): User. Again, what is the best way to handle the error of possible non existent user under provided id? Instead of returning User I can again return Maybe<User>. Then in controller this result can be handled by eg returning "204 No Content". Or I could throw an exception. The code stays referentially transparent as again I am letting the exception to bubble all the way up to global error handler (no try catch blocks).
Whereas in the first example I would lean more towards throwing an exception in the latter one I would prefer returning a Maybe. Exceptions result in cleaner code as the codebase is not cluttered with ubiquitous Eithers, Maybes, empty collections, etc. However, returning these kinds of data structure ensure explicitness of the calls, and imo results in better discoverability of the error.
Is there a place for exceptions in functional programming? What is the biggest pitfall of using exceptions over returning Maybes or Eithers? Does it make sense to be throwing exceptions in FP based app? If so is there a rule of thumb for that?
TL;DR
If there are Maybes/Eithers all over the codebase, you generally have a problem with I/O being mixed promiscuously with business logic. This doesn't get any better if you replace them with exceptions (or vice versa).
Mark Seemann has already given a good answer, but I'd like to address one specific bit:
Exceptions result in cleaner code as the codebase is not cluttered with ubiquitous Eithers, Maybes, empty collections, etc.
This isn't necessarily true. Either part.
Problem with Exceptions
The problem with exceptions is that they circumvent the normal control flow, which can make the code difficult to reason about. This seems so obvious as to barely be worthy of mention, until you end up with an error thrown 20 calls deep in a call stack where it isn't clear what triggered the error in the first place: even though the stack trace might point you to the exact line in the code you might have a very hard time figuring out the application state that caused the error to happen. The fact that you can be undisciplined about state transitions in an imperative/procedural program is of course the whole thing that FP is trying to fix.
Maybe, Maybe not: It might be Either one
You shouldn't have ubiquitous Maybes/Eithers all over the codebase, and for the same reason that you shouldn't be throwing exceptions willy-nilly all over the codebase: it complicates the code too much. You should have files that are entry points to the system, and those I/O-concerned files will be full of Maybes/Eithers, but they should then delegate to normal functions that either get lifted or dispatched to through some other mechanism depending on language (you don't specify the language). At the very least languages with option types almost always support first-class functions, you can always use a callback.
It's kind of like testability as a proxy for code quality: if your code is hard to test it probably has structural problems. If your codebase is full of Maybes/Eithers in every file it probably has structural problems.
You're asking about a couple of different scenarios, and I'll try to address each one.
Input
The first question pertains to converting a UserDTO (or, in general, any input) into a stronger representation (User). Such a conversion is usually self-contained (has no external dependencies) so can be implemented as a pure function. The best way to view such a function is as a parser.
Usually, parsers will return Either values (AKA Result), such as Either<Error, User>. The Either monad is, however, short-circuiting, meaning that if there's more than one problem with the input, only the first problem will be reported as an error.
When validating input, you often want to collect and return a list of all problems, so that the client can fix all problems and try again. A monad can't do that, but an applicative functor can. In general, I believe that validation is a solved problem.
Thus, you'll need to model validation as a type that isomomorphic to Either, but has different applicative functor behaviour, and no monad interface. The above links already show some examples, but here's a realistic C# example: An applicative reservation validation example in C#.
Data access
Data access is different, because you'd expect the data to already be valid. Reading from a data store can, however, 'go wrong' for two different reasons:
The data is not there
The data store is unreachable
The first issue (querying for missing data) can happen for various reasons, and it's usually appropriate to plan for that. Thus, a database query for a user should return Maybe<User>, indicating to the client that it should be ready to handle both cases: the user is there, or the user is not there.
The other issue is that the data store may sometimes be unreachable. This can be caused by a network partition, or if the database server is experiencing problems. In such cases, there's usually not much client code can do about it, so I usually don't bother explicitly modelling those scenarios. In other words, I'd let the implementation throw an exception, and the client code would typically not catch it (other than to log it).
In short, only throw exceptions that are unlikely to be handled. Use sum types for expected errors.
I have seen code like below, where exceptions are wrapped in a generic error. What i don't like about this approach is that we need to write a handler to deal with this UnexpectedError, inspect it, extract the exception and log it. Not sure if this is the correct way to do it.
override suspend fun update(
reservation: Reservation,
history: ReservationHistory
): Either<ReservationError, Reservation> {
return Either.catch {
mongoClient.startSession().use { clientSession ->
clientSession.startTransaction()
mongoClient.getDatabase(database)
.getCollection<ReservationDocument>()
.updateOneById(reservation.reservationId.value, MapToReservationDocument.invoke(reservation))
mongoClient.getDatabase(database)
.getCollection<ReservationHistoryDocument>()
.insertOne(MapToReservationHistoryDocument.invoke(history))
reservation
}
}.mapLeft {
UnexpectedError(it)
}
}
I am confused as to whether it is the caller or the callee's responsibility to check for data legality.
Should the callee check whether passed-in arguments should not be null and meet some other requirements so that the callee method can execute normally and successfully, and to catch any potential exceptions? Or it is the caller's responsibility to do this?
Both consumer side(client) and provider side(API) validation.
Clients should do it because it means a better experience. For example, why do a network round trip just to be told that you've got one bad text field?
Providers should do it because they should never trust clients (e.g. XSS and man in the middle attacks). How do you know the request wasn't intercepted? Validate everything.
There are several levels of valid:
All required fields present, correct formats. This is what the client validates.
# 1 plus valid relationships between fields (e.g. if X is present then Y is required).
# 1 plus # 2 plus business valid: meets all business rules for proper processing.
Only the provider side can do #2 and #3.
For an API the callee should always do proper validation and throw a descriptive exception for invalid data.
For any client with IO overhead client should do basic validation as well...
Validation: Caller vs. Called
The TLDR version is both.
The long version involves who, why, when, how, and what.
Both
Both should be ready to answer the question "can this data be operated on reliably?" Do we know enough about this data to do something meaningful with it? Many will suggest that the reliability of the data should never be trusted, but that only leads to a chicken and egg problem. Chasing it endlessly from both ends will not provide for meaningful value, but to some degree it essential.
Both must validate the shape of the data to ensure base usability. If either one does not recognize or understand the shape of the data, there is no way to know how to further handle it with any reliability. Depending on the environment, the data may need to be a particular 'type', which is often an easy way to validate shape. We often consider types that present evidence of common linage back to a particular ancestor and retain the crucial traits to possess the right shape. Other characteristics might be important if the data is anything other than an in memory structure, for instance if it is a stream or some other resource external the running context.
Many languages include data shape checking as a built-in language feature through type or interface checking. However, when favoring composition over inheritance, providing a good mechanism to verify trait existence is incumbent on the implementer. One strategy to achieve this is through dynamic programming, or particularly via type introspection, inference, or reflection.
Called
The called must validate the domain (the set of inputs) of the given context to which it will operate on. The design of the called always suggests it can handle only so many cases of input. Usually these values are broken up into certain subclasses or categories of input. We verify the domain in the called because the called is intimate with the localized constraints. It knows better than anyone else what is good input, and what is not.
Normal values: These values of the domain map to a range. For every foo there is one and only one bar.
Out of range/out of scope values: These values are part of the general domain, but will not map to a range in the context of the called. No defined behavior exists for these values, and thus no valid output is possible. Frequently out-of-range checking entails range, limit, or allowed characters (or digits, or composite values). A cardinality check (multiplicity) and subsequently a presence check (null or empty), are special forms of a range checking.
Values that lead to Illogical or undefined behavior: These values are special values, or edge cases, that are otherwise normal, but because of the algorithm design and known environment constraints, would produce unexpected results. For instance, a function that operates on numbers should guard against division by zero or accumulators that would overflow, or unintended loss of precision. Sometimes the operating environment or compiler can warn that these situations may happen, but relying on the runtime or compiler is not good practice as it may not always be capable of deducing what is possible and what is not. This stage should be largely verification, through secondary validation, that the caller provided good, usable, meaningful input.
Caller
The caller is special. The caller has two situations in which it should validate data.
The first situation is on assignment or explicit state changes, where a change happens to at least one element of the data by some explicit mechanism, internally, or externally by something in its container. This is somewhat out of scope of the question, but something to keep in mind. The important thing is to consider the context when a state change occurs, and one or more elements that describe the state are affected.
Self/Referential Integrity: Consider using an internal mechanism to validate state if other actors can reference the data. When the data has no consistency checks, it is only safe to assume it is in an indeterminate state. That is not intermediate, but indeterminate. Know thyself. When you do not use a mechanism to validate internal consistency on state change, then the data is not reliable and that leads to problems in the second situation. Make sure the data for the caller is in a known, good state; alternatively, in a known transition/recovery state. Do not make the call until you are ready.
The second situation is when the data calls a function. A caller can expect only so much from the called. The caller must know and respect that the called recognizes only a certain domain. The caller also must be self-interested, as it may continue and persist long after the called completes. This means the caller must help the called be not only successful, but also appropriate for the task: bad data in produces bad data out. On the same token, even good data in and out with respect to the called may not be appropriate for the next thing in terms of the caller. The good data out may actually be bad data in for the caller. The output of the called may invalidate the caller for the caller's current state.
Ok, so enough commentary, what should a caller validate specifically?
Logical and normal: given the data, is the called a good strategy that fits the purpose and intent? If we know it will fail with certain values, there is no point in performing the call without the appropriate guards most times. If we know the called cannot handle zero, do not ask it to as it will never succeed. What is more expensive and harder to manage: a [redundant (do we know?)] guard clause, or an exception [that occurs late in a possibly long running, externally available resource dependent process]? Implementations can change, and change suddenly. Providing the protection in the caller reduces the impact and risk in changing that implementation.
Return values: check for unsuccessful completion. This is something that a caller may or may not need to do. Before using or relying upon the returned data, check for alternative outcomes, if the system design incorporates success and failure values that may accompany the actual return value.
Footnote: In case it wasn't clear. Null is a domain issue. It may or may not be logical and normal, so it depends. If null is a natural input to a function, and the function could be reasonably expected to produce meaningful output, then leave it to the caller to use it. If the domain of the caller is such that null is not logical, then guard against it in both places.
An important question: if you are passing null to the called, and the called is producing something, isn't that a hidden creational pattern, creating something from nothing?
It's all about "contract". That's a callee that decides which parameters are fine or not.
You may put in documentation that a "null" parameter is invalid and then throwing NullPointerException or InvalidArgumentException is fine.
If returning a result for null parameter make sense - state it in the documentation. Ususally such situation is a bad design - create an overriden method with fewer parameters instead of accepting null.
Only remember about throwing descriptive exceptions. By a rule of thumb:
If the caller passed wrong arguments, different than described in documentation (i.e. null, id < 0 etc) - throw an unchecked exception (NullPointerException or InvalidArgumentException)
If the caller passed correct arguments but there may be an expected business case that makes it impossible to process the call - you may want to throw a checked descriptive exception. For example - for getPermissionsForUser(Integer userId) the caller passes userId not knowing if such user exists but it's a non-null Integer. Your method may return a list of permissions or thorw a UserNotFoundException. It may be a checked exception.
If the parameters are correct according to the documentation but they causes processing internal error - you may throw an unchecked exception. This usually means that your method is not well tested ;-)
Depends on whether you program nominally, defensively, or totally.
If you program defensively (my personal favourite for most Java methods), you validate input in the method. You throw an exception (or fail in another way) when validation fails.
If you program nominally, you don't validate input (but expect the client to make sure the input is valid). This method is useful when validation would aversely impact performance, because the validation would take a lot of time (like a time-consuming search).
If you program totally (my personal favourite for most Objective-C methods), you validate input in the method, but you change invalid input into valid input (like by snapping values to the nearest valid value).
In most cases you would program defensively (fail-fast) or totally (fail-safe). Nominal programming is risky IMO and should be avoided when expecting input from an external source.
Of course, don't forget to document everything (especially when programming nominally).
Well... it depends.
If you can be sure how to handle invalid data inside your callee then do it there.
If you are not sure (e.g. because your method is quite general and used in a few different places and ways) then let the caller decide.
For example imagine a DAO Method that has to retrieve a certain entity and you don't find it. Can you decide whether to throw an exception, maybe roll back a transaction or just consider it okay?
In cases like this it is definitely up to the caller to decide how to handle it.
Both. This is a matter of good software development on both sides and independent of environment (C/S, web, internal API) and language.
The callee should be validating all parameters against the well documented parameter list (you did document it, right?). Depending on the environment and architecture, good error messages or exceptions should be implemented to give clear indication of what is wrong with the parameters.
The caller should be ensuring that only appropriate parameter values are passed in the api call. Any invalid values should be caught as soon as possible and somehow reflected to the user.
As often occurs in life, neither side should just assume that the other guy will do the right thing and ignore the potential problem.
I'm going to take a different perspective on the question. Working inside a contained application, both caller and callee are in the same code. Then any validation that is required by the contract of the callee should be done by the callee.
So you've written a function and your contract says, "Does not accept NULL values." you should check that NULL values have not been sent and raise an error. This ensures that your code is correct, and if someone else's code is doing something it shouldn't they'll know about it sooner.
Furthermore, if you assume that other code will call your method correctly, and they don't, it will make tracking the source of potential bugs more difficult.
This is essential for "Fail Early, Fail Often" where the idea is to raise an error condition as soon as a problem is detected.
It is callee responsibility to validate data. This is because only callee knows what is valid. Also this is a good security practice.
It needs to be on both end in client side and server(callee and caller) side too.
Client :
This is most effective one.
Client validation will Reduce one request to server.
To reduce the bandwidth traffic.
Time comsuming (if it has delay responase from server)
Server :
Not to believe on UI data (due to hackers).
Mostly backend code will be reused, so we dont know whether the data will be null,etc,. so we need to validate on both callee and caler methods.
Overall,
1. If data comes from UI, Its always better to validate in UI layer and make an double check in server layer.
2. If data transfer with in server layer itself, we need to validate on callee and for double check, we requre to do on caller side also.
Thanks
In my humble opinion, and in a few more words explaining why, it is the callee's responsibility most of the time, but that doesn't mean the caller is always scot-free.
The reason why is that the callee is in the best position to know what it needs to do its work, because it's the one doing the work. It's thus good encapsulation for the object or method to be self-validating. If the callee can do no work on a null pointer, that's an invalid argument and should be thrown back out as such. If there are arguments out of range, that's easy to guard against as well.
However, "ignorance of the law is no defense". It's not a good pattern for a caller to simply shove everything it's given into its helper function and let the callee sort it out. The caller adds no value when it does this, for one thing, especially if what the caller shoves into the callee is data it was itself given by its own caller, meaning this layer of the call stack is likely redundant. It also makes both the caller's and callee's code very complex, as both sides "defend" against unwanted behavior by the other (the callee trying to salvage something workable and testing everything, and the caller wrapping the call in try-catch statements that attempt to correct the call).
The caller should therefore validate what it can know about the requirements for passed data. This is especially true when there is a time overhead inherent in making the call, such as when invoking a service proxy. If you have to wait a significant portion of a second to find out your parameters are wrong, when it would take a few ticks to do the same client-side, the advantage is obvious. The callee's guard clauses are exactly that; the last line of defense and graceful failure before something ugly gets thrown out of the actual work routine.
There should be something between caller and callee that is called a contract. The callee ensures that it does the right thing if the input data is in specified values. He still should check if the incomming data is right according to those specifications. In Java you could throw an InvalidArgumentException.
The caller should also work within the contract specifications. If he should check the data he hands over depends on the case. Ideally you should program the caller in a way that checking is unescessary because you are sure of the validity your data. If it is e.g. user input you cannot be sure that it is valid. In this case you should check it. If you don't check it you at least have to handle the exceptions and react accordingly.
The callee has the responsibility of checking that the data it receives is valid. Failure to perform this task will almost certainly result in unreliable software and exposes you to potential security issues.
Having said that if you have control of the client (caller) code then you should also perform at least some validation there as well since it will result in a better over all experience.
As a general rule try to catch problems with data as early as possible, it results in far less trouble further down the line.
Ruby has "method_missing", Tcl has "unknown", and most highly dynamic languages have an equivalent construct that is invoked when an undefined method is called.
It makes perfect sense to add such functionality; something needs to happen, and there's no reason not to allow that something to be redefined by the programmer. It's rather trivial to add, and it makes for some neat "check what my language can do" demos.
Where is this behavior actually useful in real application code?
All I can think of is:
Maybe useful for starting a debugger without unwinding the stack (but I'm not sure if that would count as "regular application code", and an exception would work just as well in most cases).
For "magical" proxy objects.. ie a lazy object that gets created or loaded on first use without changing the interface (though this seems pretty easy to do by other means).
Are there other legitimate uses?
Clarification: I don't really consider "syntactical sugar to avoid having to type quotes" to be a legitimate use. Others might, I do not.
Implementing a dynamic Web Service client
http://mpathirage.com/the-importance-of-rubys-method-missing-concept/
Adding better debug information on
failure,
encoding parameters in method name
Builders, Accessors, Proxy
delegation
http://olabini.com/blog/2010/04/patterns-of-method-missing/
If a method fails to do what it says it will do (e.g. if SendEmail fails to send an email), it should throw an exception. I blogged about this in-depth (optional read): Do your work, or throw an exception - example. It seems this idea is already a convention, even though I have not found it described elsewhere. "Exceptions are for exceptional cases" is a different guideline requiring an additional guideline "Do not use exceptions for normal flow control", and requiring discussion on what is "exceptional". "Do your work or throw an exception" is as unambiguous as the responsibility of the method (which already should be unambiguous).
All methods, including "Try" methods (e.g. TrySendEmail), should always throw if some unrecoverable error happens--something that adversely affects other features of the app, not just the functionality being attempted, such as RamChipExplodedException. That is, even though "TrySendEmail" technically accomplishes what the name indicates (it tries to send an email), it should still throw an exception if the RAM explodes (if it can even do that... you get my drift).
"Try" methods should be used only in special cases.
Are these always good guidelines (i.e., are there any exceptions (pun!) to the rule)? Can you think of any others not covered by these?
To clarify, I'm not asking for more specific rules of thumb to help follow these guidelines; I'm looking for additional guidelines.
For me, the #1 rule is: avoid exceptions in normal flow. When you debug, make sure all thrown exceptions are signalled. It is the first indication that your code is working the way you designed it.
TryXXX should not throw exceptions unless you violate the design contract. So for instance TrySendMail() may throw if you pass in a null pointer as the text for the email.
You need TryXXX methods if there is no way to get "normal flow" exception free without them. So for parsing data, they're essential, I would not reccomend a TryLogin function.
"Do your work or throw" is a good starting point, but I would not make it a dogma. I am OK with a collection returning -1 if you ask IndexOf an item that isn't there. You can discuss with your colleages where you want to draw the line and put in your coding standards.
I also use exceptions a lot to label control paths that I think are not used. For instance if I am forced to write a GetHashCode function that I assume won't be called, I make it throw.
The fundamental goal of efficient and effective exception handling strategy is for a method to throw exceptions for situations the caller isn't prepared to handle, while letting the caller handle things it is prepared for without the overhead of throwing an exception.
Since functions cannot be clairvoyant with regard to caller expectations, the usual pattern is to have multiple versions of routines where different callers may have different expectations. Someone who decides to call Dictionary.GetValue indicates an expectation that the passed-in key exists in the dictionary. By contrast, someone who calls Dictionary.TryGetValue conveys an expectation that the key may or may not be in a dictionary--either would be reasonably expected. Even in that case, however, the caller probably isn't expecting the CPU to catch fire. If a CpuCaughtFireException occurs in the processing of TryGetValue, it should probably be allowed to propagate up the call stack.
In my opinion, you shouldn't get too hung up about the name "exception" and what exactly is and is not an exceptional situation.
Exceptions are simply another flow control statement, one that allows the flow of control to shortcut up the call stack.
If that's what you need (i.e. you expect immediate calling methods not to be able to handle the condition usefully), use an exception. Otherwise, don't.
I don't use this pattern, maybe there are some places where it would have been appropriate and I used something else. Have you used it in your daily coding? Feel free to give samples, in your language of choice, along with your explanation.
Callbacks aren't really a "pattern" - more like a building block. A number of the gang of four design patterns use virtual methods in a callback-like way. Justin Niessner has already mentioned Observer.
Callbacks are much older than OOP (and probably older than 3GLs and even assembler). Another old idea is the parameter block - the C interpretation being a struct full of related members to be passed to a function so that function doesn't need a huge parameter list.
OOP classes build upon the parameter block (and add a philosophy to it). The class instance itself is a parameter block passed by reference to its methods. The virtual table is a dispatch-handling parameter block. Every virtual method has a callback pointer in the dispatch-handling parameter block. A pure virtual method reserves space for the callback pointer in the parameter block, and promises to provide the actual pointer later.
Since the class is the building block for object oriented design patterns, and parameter blocks and callbacks are the building blocks of classes - well, you could claim that all OOP design patterns are built from these ideas.
I'd like to be able to say "parameter blocks and callbacks, plus style rules guiding their use, inspired object orientation" but as appealing as it sounds, I don't know whether it's true.
I use callbacks pretty much every day in the following scenarios:
Events: When the user clicks their mouse on a control, presses a key or otherwise interacts with the UI in a way I need to handle, I subscribe to the delegate that the control publishes for the event. I can then handle it by updating the UI, cancelling the event in certain circumstances or otherwise taking some special action.
Multithreaded Programming: When programming a GUI, it's important to keep the UI responsive and indicate the progress of a long-running background event to the user. To do this, I kick off the task in a separate thread and then publish delegates (events in the .NET world) that provide my UI with the opporutinty to notify the user about progress that's happening.
Lambda functions: In .NET, lambda functions are a form of a delegate, one that lets me interact with another piece of code's operation at a later point in time. LINQ is a great example of this. I can create a small matching function and then supply it to a LINQ query. Later, when I execute my query against a collection, the matching function is called to determine if there is a match for the query. This allows me to not have to build or worry about the query mechanism. I just have to tell the query mechanism where to go to find out if a comparison is a match or not.
These examples just scratch the surface, I'm sure. But they are useful examples of how I use callbacks every day.
The .NET platform uses callbacks heavily to implement the Observer pattern.
They also get used for handling Asynchronous processes.
Objective C and the Cocoa framework make a lot of use of it. An example would be NSURLConnection, which will inform an object given to it (called its delegate) when something happens on the connection:
NSURLConnection *foo = [[NSURLConnection alloc] initWithRequest:request delegate:self];
Note the passing of delegate there. The request proceeds in the background, and the instance will then send messages to the delegate (in this case, self), like:
connectionDidFinishLoading:
connection:didFailWithError:
You get the idea. I believe this is called the "observer pattern". It's all tied in to Cocoa's event loop (as far as I know, I'm still learning) and is cheap 'n easy asynchronous programming. A lot of frameworks in a variety of languages follow this approach.
.NET has delegates as well, which are similar. Think events.
I use it a great deal in javascript to let me know when an asynchronous call has finished, so the result can be processed.
But, in javascript, and now in C#3, I pass in functions as a parameter, so that the processing can go on without explicitly setting up a delegate to be called.