How to separate data validation from my simple domain objects (POCOs)? - language-agnostic

This question is language agnostic but I am a C# guy so I use the term POCO to mean an object that only preforms data storage, usually using getter and setter fields.
I just reworked my Domain Model to be super-duper POCO and am left with a couple of concerns regarding how to ensure that the property values make sense witin the domain.
For example, the EndDate of a Service should not exceed the EndDate of the Contract that Service is under. However, it seems like a violation of SOLID to put the check into the Service.EndDate setter, not to mention that as the number of validations that need to be done grows my POCO classes will become cluttered.
I have some solutions (will post in answers), but they have their disadvantages and am wondering what are some favorite approaches to solving this dilemma?

I think you're starting off with a bad assumption, ie, that you should have objects that do nothing but store data, and have no methods but accessors. The whole point of having objects is to encapsulate data and behaviors. If you have a thing that's just, basically, a struct, what behaviors are you encapsulating?

I always hear people argument for a "Validate" or "IsValid" method.
Personally I think this may work, but with most DDD projects you usually end up
with multiple validations that are allowable depending on the specific state of the object.
So I prefer "IsValidForNewContract", "IsValidForTermination" or similar, because I believe most projects end up with multiple such validators/states per class. That also means I get no interface, but I can write aggregated validators that read very well reflect the business conditions I am asserting.
I really do believe the generic solutions in this case very often take focus away from what's important - what the code is doing - for a very minor gain in technical elegance (the interface, delegate or whatever). Just vote me down for it ;)

A colleague of mine came up with an idea that worked out pretty well. We never came up with a great name for it but we called it Inspector/Judge.
The Inspector would look at an object and tell you all of the rules it violated. The Judge would decide what to do about it. This separation let us do a couple of things. It let us put all the rules in one place (Inspector) but we could have multiple Judges and choose the Judge by the context.
One example of the use of multiple Judges revolves around the rule that said a Customer must have an Address. This was a standard three tier app. In the UI tier the Judge would produce something that the UI could use to indicate the fields that had to be filled in. The UI Judge did not throw exceptions. In the service layer there was another Judge. If it found a Customer without an Address during Save it would throw an exception. At that point you really have to stop things from proceeding.
We also had Judges that were more strict as the state of the objects changed. It was an insurance application and during the Quoting process a Policy was allowed to be saved in an incomplete state. But once that Policy was ready to be made Active a lot of things had to be set. So the Quoting Judge on the service side was not as strict as the Activation Judge. Yet the rules used in the Inspector were still the same so you could still tell what wasn't complete even if you decided not to do anything about it.

One solution is to have each object's DataAccessObject take a list of Validators. When Save is called it preforms a check against each validator:
public class ServiceEndDateValidator : IValidator<Service> {
public void Check(Service s) {
if(s.EndDate > s.Contract.EndDate)
throw new InvalidOperationException();
}
}
public class ServiceDao : IDao<Service> {
IValidator<Service> _validators;
public ServiceDao(IEnumerable<IValidator<Service>> validators) {_validators = validators;}
public void Save(Service s) {
foreach(var v in _validators)
v.Check(service);
// Go on to save
}
}
The benefit, is very clear SoC, the disadvantage is that we don't get the check until Save() is called.

In the past I have usually delegated validation to a service unto its own, such as a ValidationService. This in principle still ad hears to the philosophy of DDD.
Internally this would contain a collection of Validators and a very simple set of public methods such as Validate() which could return a collection of error object.
Very simply, something like this in C#
public class ValidationService<T>
{
private IList<IValidator> _validators;
public IList<Error> Validate(T objectToValidate)
{
foreach(IValidator validator in _validators)
{
yield return validator.Validate(objectToValidate);
}
}
}
Validators could either be added within a default constructor or injected via some other class such as a ValidationServiceFactory.

I think that would probably be the best place for the logic, actually, but that's just me. You could have some kind of IsValid method that checks all of the conditions too and returns true/false, maybe some kind of ErrorMessages collection but that's an iffy topic since the error messages aren't really a part of the Domain Model. I'm a little biased as I've done some work with RoR and that's essentially what its models do.

Another possibility is to have each of my classes implement
public interface Validatable<T> {
public event Action<T> RequiresValidation;
}
And have each setter for each class raise the event before setting (maybe I could achieve this via attributes).
The advantage is real-time validation checking. But messier code and it is unclear who should be doing the attaching.

Here's another possibility. Validation is done through a proxy or decorator on the Domain object:
public class ServiceValidationProxy : Service {
public override DateTime EndDate {
get {return EndDate;}
set {
if(value > Contract.EndDate)
throw new InvalidOperationexception();
base.EndDate = value;
}
}
}
Advantage: Instant validation. Can easily be configured via an IoC.
Disadvantage: If a proxy, validated properties must be virtual, if a decorator all domain models must be interface-based. The validation classes will end up a bit heavyweight - proxys have to inherit the class and decorators have to implement all the methods. Naming and organization might get confusing.

Related

Using GET and POST vs getter and setter methods (URLS)

As a trained programmer, I have been taught, repeatedly to use getter and setter methods to control the access and modification of class variables. This is how you're told to do it in Java, Python, C++ and pretty much every other modern language under the sun. However, when I started learning about web development, this seemed cast aside. Instead, we're told to use one URL with GET and POST calls, which seems really odd.
So imagine I have a Person object and I want to update their age. In the non-HTTP world, you're supposed to have a method called <PersonObject>.getAge() and another method called <PersonObject>.setAge(int newAge). But say, instead, you've got a webserver that holds user profile information. According to HTTP conventions, you'd have a URL like '/account/age'. To get their age, you'd request that URL with a 'GET', and to set their age, you'd request that URL with a 'POST' and somehow (form, JSON, URL-arg, etc.) send the new value along.
The HTTP method just feels awkward. To me, that's analogous to changing the non-HTTP version to one method called age, and you'd get their age with <PersonObject>.age('GET'), and set their age with <PersonObject>.age(newAge, 'SET'). Why is it done that way?
Why not have one URL called '/account/getAge' and another called '/account/setAge'?
What you are refering to is a RESTful API. While not required (you could just use getters and setters) it is indeed considered good practice. This however does not meen you have to change the code of your data objects. I always use getters and setters for my business logic in the models layer.
What you are talking to through the HTTP request are the controllers however, and they rarely use getters and setters (I suppose I do not need to explain the MVC design pattern to an experienced programmer). You should never directly access your models through HTTP (how about authentication and error handling and stuff...)
If you have some spare time I would advise you to have a look at this screencast, which I found very useful.
You certainly could have separate URLs if you like, but getters and setters can share names in the original context of your question anyway because of overloading.
class Person {
private age;
public age() {
return this.age;
}
public age(int age) {
this.age = age;
}
}
So if it helps you, you can think of it like that.

reusing queries in 2 datacontext using dependency injection

I have a web application that uses linq-to-sql queries (will soon be upgraded to linq-to-EF compiled queries) and for which there's data context and a database already in place. I want to create a demo version of the application and for the demo, I want to use an entirely different database file but that will have the same tables. So in essence, I'll have the same data structure for two different databases: one database for logged-in users and one database for demo users. I want to reuse many of the queries I've already written; they look like this:
public class FruitQueries
{
public List<SomeObjectModel> MyQuery(list of parameters)
{
using (MyDataContext TheDC = new MyDataContext())
{
var TheQueryResult = (from f in TheDC.Fruits
......).ToList();
return TheQueryResult;
}
}
public List<SomeObject> AnotherQuery(some other parameters) {...}
}
Now I think I know that this calls for dependency injection where the data context is passed in as a parameter but I'm not sure on the syntax. How do you reuse queries using dependency injection to make them work on two different databases? Right now I'm using a using statement and I want to keep this pattern; is that possible if I inject the DC as a parameter?
Thanks.
Since you already have a lot of code in place, probably the simplest thing to do is to inject a factory:
public interface IMyDataContextFactory
{
MyDataContext CreateNewContext();
}
All the code will roughly stay the same:
public List<SomeObjectModel> MyQuery(params)
{
using (var TheDC = this.factory.CreateNewContext())
{
var TheQueryResult = (from f in TheDC.Fruits
......).ToList();
return TheQueryResult;
}
}
You can let the injected IMyDataContextFactory decide how to construct a MyDataContext instance (based on the user). This would be trivial.
In the end it will probably be better to inject a MyDataContext (or an abstraction such as IUnitOfWork) into consumers, but this changes everything completely. Since this class is passed in from the outside, the consumer isn't responsible anymore for disposing it, but someone else is. Although disposing such instance isn't that hard with most DI container. It gets harder though when you want to share the same MyDataContext instance over multiple consumers (within the same web request for instance) and where do you call SubmitChanges?
Elaborating the previous answer
What you can do, is provide the connectionstring to the DC (would this qualify as contructor injection?)
using (MyDataContext TheDC = new MyDataContext(this.factory.CreateConString()))
This way, disposal is still handled by the consumer and you can continue your Using() approach. Your factory can read the two different connectionstrings from your webconfig and determine the right one to use, based on the user. (not that trivial as it may seem)
PS: I think the quickest way is to deploy the demo application to a different URL so they can have a separate web.config and you do not need to code anything but that does not answer your question.

How to implement PropertyDefiner for logback to access multiple properties

I would like to define some properties in my logback.xml config file and saw that by implementing the PropertyDefiner was a great way to set properties in a customizable way.
After starting to implement it I began to wonder how to access the value of the name attribute of the element within the tag. I'm not seeing anyway to do this and I'm scratching my head. Would this PropertyDefiner really make you create a new implementation for every single property? Why not just hard code it? I didn't see much discussion about this out on the web.
I hope I'm just not seeing it and that the brains of stackoverflow can help me out. Does anyone know how to do this? Thanks!
I found this discussion: essentially the same question is asked, but no answer was returned.
fyi: I want to customize how I get my properties because I am pulling it from a database. I have a helper class which pulls the properties in on server startup. These properties vary based on environment (dev, test, prod, etc.)
As of logback version 1.0.6, the value of the name attribute cannot be accessed directly. However, nothing prevents you from passing the value of the name attribute in a property of your choice. Example:
<define name="rootLevel" class="Your.PropertyDefiner">
<myKey>rootLevel</myKey>
</define>
where myKey is a property of Your.PropertyDefiner. For example:
class Your.PropertyDefiner implements PropertyDefiner {
String myKey;
public void setMyKey(String k) {
this.myKey= k;
}
public String getPropertyValue() {
return ...
}
}
Joran, logback's configuration framework, takes care of the wiring. Joran will inject the value of the myKey element into the myKey property of Your.PropertyDefiner. If you are curious about the technical details, see the documentation on implicit actions and implicit actions in practice.

Registering derived classes with reflection, good or evil?

As we all know, when we derive a class and use polymorphism, someone, somewhere needs to know what class to instanciate. We can use factories, a big switch statement, if-else-if, etc. I just learnt from Bill K this is called Dependency Injection.
My Question: Is it good practice to use reflection and attributes as the dependency injection mechanism? That way, the list gets populated dynamically as we add new types.
Here is an example. Please no comment about how loading images can be done other ways, we know.
Suppose we have the following IImageFileFormat interface:
public interface IImageFileFormat
{
string[] SupportedFormats { get; };
Image Load(string fileName);
void Save(Image image, string fileName);
}
Different classes will implement this interface:
[FileFormat]
public class BmpFileFormat : IImageFileFormat { ... }
[FileFormat]
public class JpegFileFormat : IImageFileFormat { ... }
When a file needs to be loaded or saved, a manager needs to iterate through all known loader and call the Load()/Save() from the appropriate instance depending on their SupportedExtensions.
class ImageLoader
{
public Image Load(string fileName)
{
return FindFormat(fileName).Load(fileName);
}
public void Save(Image image, string fileName)
{
FindFormat(fileName).Save(image, fileName);
}
IImageFileFormat FindFormat(string fileName)
{
string extension = Path.GetExtension(fileName);
return formats.First(f => f.SupportedExtensions.Contains(extension));
}
private List<IImageFileFormat> formats;
}
I guess the important point here is whether the list of available loader (formats) should be populated by hand or using reflection.
By hand:
public ImageLoader()
{
formats = new List<IImageFileFormat>();
formats.Add(new BmpFileFormat());
formats.Add(new JpegFileFormat());
}
By reflection:
public ImageLoader()
{
formats = new List<IImageFileFormat>();
foreach(Type type in Assembly.GetExecutingAssembly().GetTypes())
{
if(type.GetCustomAttributes(typeof(FileFormatAttribute), false).Length > 0)
{
formats.Add(Activator.CreateInstance(type))
}
}
}
I sometimes use the later and it never occured to me that it could be a very bad idea. Yes, adding new classes is easy, but the mechanic registering those same classes is harder to grasp and therefore maintain than a simple coded-by-hand list.
Please discuss.
My personal preference is neither - when there is a mapping of classes to some arbitrary string, a configuration file is the place to do it IMHO. This way, you never need to modify the code - especially if you use a dynamic loading mechanism to add new dynamic libraries.
In general, I always prefer some method that allows me to write code once as much as possible - both your methods require altering already-written/built/deployed code (since your reflection route makes no provision for adding file format loaders in new DLLs).
Edit by Coincoin:
Reflection approach could be effectively combined with configuration files to locate the implmentations to be injected.
The type could be declared explicitely in the config file using canonical names, similar to MSBuild <UsingTask>
The config could locate the assemblies, but then we have to inject all matching types, ala Microsoft Visual Studio Packages.
Any other mechanism to match a value or set of condition to the needed type.
My vote is that the reflection method is nicer. With that method, adding a new file format only modifies one part of the code - the place where you define the class to handle the file format. Without reflection, you'll have to remember to modify the other class, the ImageLoader, as well
Isn't this pretty much what the Dependency Injection pattern is all about?
If you can isolate the dependencies then the mechanics will almost certainly be reflection based, but it will be configuration file driven so the messiness of the reflection can be pretty well encapsulated and isolated.
I believe with DI you simply say I need an object of type <interface> with some other parameters, and the DI system returns an object to you that satisfies your conditions.
This goes together with IoC (Inversion of Control) where the object being supplied may need something else, so that other thing is automatically created and installed into your object (being created by DI) before it's returned to the user.
I know this borders on the "no comment about loading images other ways", but why not just flip your dependencies -- rather than have ImageLoader depend on ImageFileFormats, have each IImageFileFormat depend on an ImageLoader? You'll gain a few things out of this:
Each time you add a new IImageFileFormat, you won't need to make any changes anywhere else (and you won't have to use reflection, either)
If you take it one step further and abstract ImageLoader, you can mock it in Unit Tests, making testing the concrete implementations of each IImageFileFormat that much easier
In vb.net, if all the image loaders will be in the same assembly, one could use partial classes and events to achieve the desired effect (have a class whose purpose is to fire an event when the image loaders should register themselves; each file containing image loaders can have use a "partial class" to add another event handler to that class); C# doesn't have a direct equivalent to vb.net's WithEvents syntax, but I suspect partial classes are a limited mechanism for achieving the same thing.

When is it right for a constructor to throw an exception?

When is it right for a constructor to throw an exception? (Or in the case of Objective C: when is it right for an init'er to return nil?)
It seems to me that a constructor should fail -- and thus refuse to create an object -- if the object isn't complete. I.e., the constructor should have a contract with its caller to provide a functional and working object on which methods can be called meaningfully? Is that reasonable?
The constructor's job is to bring the object into a usable state. There are basically two schools of thought on this.
One group favors two-stage construction. The constructor merely brings the object into a sleeper state in which it refuses to do any work. There's an additional function that does the actual initialization.
I've never understood the reasoning behind this approach. I'm firmly in the group that supports one-stage construction, where the object is fully initialized and usable after construction.
One-stage constructors should throw if they fail to fully initialize the object. If the object cannot be initialized, it must not be allowed to exist, so the constructor must throw.
Eric Lippert says there are 4 kinds of exceptions.
Fatal exceptions are not your fault, you cannot prevent them, and you cannot sensibly clean up from them.
Boneheaded exceptions are your own darn fault, you could have prevented them and therefore they are bugs in your code.
Vexing exceptions are the result of unfortunate design decisions. Vexing exceptions are thrown in a completely non-exceptional circumstance, and therefore must be caught and handled all the time.
And finally, exogenous exceptions appear to be somewhat like vexing exceptions except that they are not the result of unfortunate design choices. Rather, they are the result of untidy external realities impinging upon your beautiful, crisp program logic.
Your constructor should never throw a fatal exception on its own, but code it executes may cause a fatal exception. Something like "out of memory" isn't something you can control, but if it occurs in a constructor, hey, it happens.
Boneheaded exceptions should never occur in any of your code, so they're right out.
Vexing exceptions (the example is Int32.Parse()) shouldn't be thrown by constructors, because they don't have non-exceptional circumstances.
Finally, exogenous exceptions should be avoided, but if you're doing something in your constructor that depends on external circumstances (like the network or filesystem), it would be appropriate to throw an exception.
Reference link: https://blogs.msdn.microsoft.com/ericlippert/2008/09/10/vexing-exceptions/
There is generally nothing to be gained by divorcing object initialization from construction. RAII is correct, a successful call to the constructor should either result in a fully initialized live object or it should fail, and ALL failures at any point in any code path should always throw an exception. You gain nothing by use of a separate init() method except additional complexity at some level. The ctor contract should be either it returns a functional valid object or it cleans up after itself and throws.
Consider, if you implement a separate init method, you still have to call it. It will still have the potential to throw exceptions, they still have to be handled and they virtually always have to be called immediately after the constructor anyway, except now you have 4 possible object states instead of 2 (IE, constructed, initialized, uninitialized, and failed vs just valid and non-existent).
In any case I've run across in 25 years of OO development cases where it seems like a separate init method would 'solve some problem' are design flaws. If you don't need an object NOW then you shouldn't be constructing it now, and if you do need it now then you need it initialized. KISS should always be the principle followed, along with the simple concept that the behavior, state, and API of any interface should reflect WHAT the object does, not HOW it does it, client code should not even be aware that the object has any kind of internal state that requires initialization, thus the init after pattern violates this principle.
As far as I can tell, no-one is presenting a fairly obvious solution which embodies the best of both one-stage and two-stage construction.
note: This answer assumes C#, but the principles can be applied in most languages.
First, the benefits of both:
One-Stage
One-stage construction benefits us by preventing objects from existing in an invalid state, thus preventing all sorts of erroneous state management and all the bugs which come with it. However, it leaves some of us feeling weird because we don't want our constructors to throw exceptions, and sometimes that's what we need to do when initialization arguments are invalid.
public class Person
{
public string Name { get; }
public DateTime DateOfBirth { get; }
public Person(string name, DateTime dateOfBirth)
{
if (string.IsNullOrWhitespace(name))
{
throw new ArgumentException(nameof(name));
}
if (dateOfBirth > DateTime.UtcNow) // side note: bad use of DateTime.UtcNow
{
throw new ArgumentOutOfRangeException(nameof(dateOfBirth));
}
this.Name = name;
this.DateOfBirth = dateOfBirth;
}
}
Two-Stage via validation method
Two-stage construction benefits us by allowing our validation to be executed outside of the constructor, and therefore prevents the need for throwing exceptions within the constructor. However, it leaves us with "invalid" instances, which means there's state we have to track and manage for the instance, or we throw it away immediately after heap-allocation. It begs the question: Why are we performing a heap allocation, and thus memory collection, on an object we don't even end up using?
public class Person
{
public string Name { get; }
public DateTime DateOfBirth { get; }
public Person(string name, DateTime dateOfBirth)
{
this.Name = name;
this.DateOfBirth = dateOfBirth;
}
public void Validate()
{
if (string.IsNullOrWhitespace(Name))
{
throw new ArgumentException(nameof(Name));
}
if (DateOfBirth > DateTime.UtcNow) // side note: bad use of DateTime.UtcNow
{
throw new ArgumentOutOfRangeException(nameof(DateOfBirth));
}
}
}
Single-Stage via private constructor
So how can we keep exceptions out of our constructors, and prevent ourselves from performing heap allocation on objects which will be immediately discarded? It's pretty basic: we make the constructor private and create instances via a static method designated to perform an instantiation, and therefore heap-allocation, only after validation.
public class Person
{
public string Name { get; }
public DateTime DateOfBirth { get; }
private Person(string name, DateTime dateOfBirth)
{
this.Name = name;
this.DateOfBirth = dateOfBirth;
}
public static Person Create(
string name,
DateTime dateOfBirth)
{
if (string.IsNullOrWhitespace(Name))
{
throw new ArgumentException(nameof(name));
}
if (dateOfBirth > DateTime.UtcNow) // side note: bad use of DateTime.UtcNow
{
throw new ArgumentOutOfRangeException(nameof(DateOfBirth));
}
return new Person(name, dateOfBirth);
}
}
Async Single-Stage via private constructor
Aside from the aforementioned validation and heap-allocation prevention benefits, the previous methodology provides us with another nifty advantage: async support. This comes in handy when dealing with multi-stage authentication, such as when you need to retrieve a bearer token before using your API. This way, you don't end up with an invalid "signed out" API client, and instead you can simply re-create the API client if you receive an authorization error while attempting to perform a request.
public class RestApiClient
{
public RestApiClient(HttpClient httpClient)
{
this.httpClient = new httpClient;
}
public async Task<RestApiClient> Create(string username, string password)
{
if (username == null)
{
throw new ArgumentNullException(nameof(username));
}
if (password == null)
{
throw new ArgumentNullException(nameof(password));
}
var basicAuthBytes = Encoding.ASCII.GetBytes($"{username}:{password}");
var basicAuthValue = Convert.ToBase64String(basicAuthBytes);
var authenticationHttpClient = new HttpClient
{
BaseUri = new Uri("https://auth.example.io"),
DefaultRequestHeaders = {
Authentication = new AuthenticationHeaderValue("Basic", basicAuthValue)
}
};
using (authenticationHttpClient)
{
var response = await httpClient.GetAsync("login");
var content = response.Content.ReadAsStringAsync();
var authToken = content;
var restApiHttpClient = new HttpClient
{
BaseUri = new Uri("https://api.example.io"), // notice this differs from the auth uri
DefaultRequestHeaders = {
Authentication = new AuthenticationHeaderValue("Bearer", authToken)
}
};
return new RestApiClient(restApiHttpClient);
}
}
}
The downsides of this method are few, in my experience.
Generally, using this methodology means that you can no longer use the class as a DTO because deserializing to an object without a public default constructor is hard, at best. However, if you were using the object as a DTO, you shouldn't really be validating the object itself, but rather invaliding the values on the object as you attempt to use them, since technically the values aren't "invalid" with regards to the DTO.
It also means that you'll end up creating factory methods or classes when you need to allow an IOC container to create the object, since otherwise the container won't know how to instantiate the object. However, in a lot of cases the factory methods end up being one of Create methods themselves.
Because of all the trouble that a partially created class can cause, I'd say never.
If you need to validate something during construction, make the constructor private and define a public static factory method. The method can throw if something is invalid. But if everything checks out, it calls the constructor, which is guaranteed not to throw.
A constructor should throw an exception when it is unable to complete the construction of said object.
For example, if the constructor is supposed to allocate 1024 KB of ram, and it fails to do so, it should throw an exception, this way the caller of the constructor knows that the object is not ready to be used and there is an error somewhere that needs to be fixed.
Objects that are half-initialised and half-dead just cause problems and issues, as there really is no way for the caller to know. I'd rather have my constructor throw an error when things go wrong, than having to rely on the programming to run a call to the isOK() function which returns true or false.
It's always pretty dodgy, especially if you're allocating resources inside a constructor; depending on your language the destructor won't get called, so you need to manually cleanup. It depends on how when an object's lifetime begins in your language.
The only time I've really done it is when there's been a security problem somewhere that means the object should not, rather than cannot, be created.
It's reasonable for a constructor to throw an exception so long as it cleans itself up properly. If you follow the RAII paradigm (Resource Acquisition Is Initialization) then it is quite common for a constructor to do meaningful work; a well-written constructor will in turn clean up after itself if it can't fully be initialized.
See C++ FAQ sections 17.2 and 17.4.
In general, I have found that code that is easier to port and maintain results if constructors are written so they do not fail, and code that can fail is placed in a separate method that returns an error code and leaves the object in an inert state.
If you are writing UI-Controls (ASPX, WinForms, WPF, ...) you should avoid throwing exceptions in the constructor because the designer (Visual Studio) can't handle them when it creates your controls. Know your control-lifecycle (control events) and use lazy initialization wherever possible.
Note that if you throw an exception in an initializer, you'll end up leaking if any code is using the [[[MyObj alloc] init] autorelease] pattern, since the exception will skip the autorelease.
See this question:
How do you prevent leaks when raising an exception in init?
You absolutely should throw an exception from a constructor if you're unable to create a valid object. This allows you to provide proper invariants in your class.
In practice, you may have to be very careful. Remember that in C++, the destructor will not be called, so if you throw after allocating your resources, you need to take great care to handle that properly!
This page has a thorough discussion of the situation in C++.
Throw an exception if you're unable to initialize the object in the constructor, one example are illegal arguments.
As a general rule of thumb an exception should always be thrown as soon as possible, as it makes debugging easier when the source of the problem is closer to the method signaling something is wrong.
Throwing an exception during construction is a great way to make your code way more complex. Things that would seem simple suddenly become hard. For example, let's say you have a stack. How do you pop the stack and return the top value? Well, if the objects in the stack can throw in their constructors (constructing the temporary to return to the caller), you can't guarantee that you won't lose data (decrement stack pointer, construct return value using copy constructor of value in stack, which throws, and now have a stack that just lost an item)! This is why std::stack::pop does not return a value, and you have to call std::stack::top.
This problem is well described here, check Item 10, writing exception-safe code.
The usual contract in OO is that object methods do actually function.
So as a corrolary, to never return a zombie object form a constructor/init.
A zombie is not functional and may be missing internal components. Just a null-pointer exception waiting to happen.
I first made zombies in Objective C, many years ago.
Like all rules of thumb , there is an "exception".
It is entirely possible that a specific interface may have a contract that says that
there exists a method "initialize" that is allowed to thron an exception.
That an object inplementing this interface may not respond correctly to any calls except property setters until initialize has been called. I used this for device drivers in an OO operating system during the boot process, and it was workable.
In general, you don't want zombie objects. In languages like Smalltalk with become things get a little fizzy-buzzy, but overuse of become is bad style too. Become lets an object change into another object in-situ, so there is no need for envelope-wrapper(Advanced C++) or the strategy pattern(GOF).
I can't address best practice in Objective-C, but in C++ it's fine for a constructor to throw an exception. Especially as there's no other way to ensure that an exceptional condition encountered at construction is reported without resorting to invoking an isOK() method.
The function try block feature was designed specifically to support failures in constructor memberwise initialization (though it may be used for regular functions also). It's the only way to modify or enrich the exception information which will be thrown. But because of its original design purpose (use in constructors) it doesn't permit the exception to be swallowed by an empty catch() clause.
Yes, if the constructor fails to build one of its internal part, it can be - by choice - its responsibility to throw (and in certain language to declare) an explicit exception , duly noted in the constructor documentation.
This is not the only option: It could finish the constructor and build an object, but with a method 'isCoherent()' returning false, in order to be able to signal an incoherent state (that may be preferable in certain case, in order to avoid a brutal interruption of the execution workflow due to an exception)
Warning: as said by EricSchaefer in his comment, that can bring some complexity to the unit testing (a throw can increase the cyclomatic complexity of the function due to the condition that triggers it)
If it fails because of the caller (like a null argument provided by the caller, where the called constructor expects a non-null argument), the constructor will throw an unchecked runtime exception anyway.
I'm not sure that any answer can be entirely language-agnostic. Some languages handle exceptions and memory management differently.
I've worked before under coding standards requiring exceptions never be used and only error codes on initializers, because developers had been burned by the language poorly handling exceptions. Languages without garbage collection will handle heap and stack very differently, which may matter for non RAII objects. It is important though that a team decide to be consistent so they know by default if they need to call initializers after constructors. All methods (including constructors) should also be well documented as to what exceptions they can throw, so callers know how to handle them.
I'm generally in favor of a single-stage construction, as it's easy to forget to initialize an object, but there are plenty of exceptions to that.
Your language support for exceptions isn't very good.
You have a pressing design reason to still use new and delete
Your initialization is processor intensive and should run async to the thread that created the object.
You are creating a DLL that may be throwing exceptions outside it's interface to an application using a different language. In this case it may not be so much an issue of not throwing exceptions, but making sure they are caught before the public interface. (You can catch C++ exceptions in C#, but there are hoops to jump through.)
Static constructors (C#)
The OP's question has a "language-agnostic" tag... this question cannot be safely answered the same way for all languages/situations.
The following C# example's class hierarchy throws in class B's constructor, skipping an immediate call to class A's IDisposeable.Dispose upon exit of the main's using, skipping explicit disposal of class A's resources.
If, for example, class A had created a Socket at construction, connected to a network resource, such would likely still be the case after the using block (a relatively hidden anomaly).
class A : IDisposable
{
public A()
{
Console.WriteLine("Initialize A's resources.");
}
public void Dispose()
{
Console.WriteLine("Dispose A's resources.");
}
}
class B : A, IDisposable
{
public B()
{
Console.WriteLine("Initialize B's resources.");
throw new Exception("B construction failure: B can cleanup anything before throwing so this is not a worry.");
}
public new void Dispose()
{
Console.WriteLine("Dispose B's resources.");
base.Dispose();
}
}
class C : B, IDisposable
{
public C()
{
Console.WriteLine("Initialize C's resources. Not called because B throws during construction. C's resources not a worry.");
}
public new void Dispose()
{
Console.WriteLine("Dispose C's resources.");
base.Dispose();
}
}
class Program
{
static void Main(string[] args)
{
try
{
using (C c = new C())
{
}
}
catch
{
}
// Resource's allocated by c's "A" not explicitly disposed.
}
}
Speaking strictly from a Java standpoint, any time you initialize a constructor with illegal values, it should throw an exception. That way it does not get constructed in a bad state.
To me it's a somewhat philosophical design decision.
It's very nice to have instances which are valid as long as they exist, from ctor time onwards. For many nontrivial cases this may require throwing exceptions from the ctor if a memory/resource allocation can't be made.
Some other approaches are the init() method which comes with some issues of its own. One of which is ensuring init() actually gets called.
A variant is using a lazy approach to automatically call init() the first time an accessor/mutator gets called, but that requires any potential caller to have to worry about the object being valid. (As opposed to the "it exists, hence it's valid philosophy").
I've seen various proposed design patterns to deal with this issue too. Such as being able to create an initial object via ctor, but having to call init() to get your hands on a contained, initialized object with accesors/mutators.
Each approach has its ups and downs; I have used all of these successfully. If you don't make ready-to-use objects from the instant they're created, then I recommend a heavy dose of asserts or exceptions to make sure users don't interact before init().
Addendum
I wrote from a C++ programmers perspective. I also assume you are properly using the RAII idiom to handle resources being released when exceptions are thrown.
Using factories or factory methods for all object creation, you can avoid invalid objects without throwing exceptions from constructors. The creation method should return the requested object if it's able to create one, or null if it's not. You lose a little bit of flexibility in handling construction errors in the user of a class, because returning null doesn't tell you what went wrong in the object creation. But it also avoids adding the complexity of multiple exception handlers every time you request an object, and the risk of catching exceptions you shouldn't handle.
The best advice I've seen about exceptions is to throw an exception if, and only if, the alternative is failure to meet a post condition or to maintain an invariant.
That advice replaces an unclear subjective decision (is it a good idea) with a technical, precise question based on design decisions (invariant and post conditions) you should already have made.
Constructors are just a particular, but not special, case for that advice. So the question becomes, what invariants should a class have? Advocates of a separate initialization method, to be called after construction, are suggesting that the class has two or more operating mode, with an unready mode after construction and at least one ready mode, entered after initialization. That is an additional complication, but acceptable if the class has multiple operating modes anyway. It is hard to see how that complication is worthwhile if the class would otherwise not have operating modes.
Note that pushing set up into a separate initialization method does not enable you to avoid exceptions being thrown. Exceptions that your constructor might have thrown will now be thrown by the initialization method. All the useful methods of your class will have to throw exceptions if they are called for an uninitialized object.
Note also that avoiding the possibility of exceptions being thrown by your constructor is troublesome, and in many cases impossible in many standard libraries. This is because the designers of those libraries believe that throwing exceptions from constructors is a good idea. In particular, any operation that attempts to acquire a non shareable or finite resource (such as allocating memory) can fail, and that failure is typically indicated in OO languages and libraries by throwing an exception.