Should I instantiate a collection or inherit from collection? - language-agnostic

I've asked myself this question a number of times when creating classes, particularly those involving collections, but I've never come up with a satisfactory answer. It's a OOP design question.
For example, in a checkbook register program say I have a class of BankAccount. BankAccounts contain data involving the Name of the account, the Type of account (enum of Checking, Saving,...), and other data, but most importantly is a collection of the Adjustments (deposits or withdrawals) in the account.
Here, I have two options for keeping a collection of the Adjustments:
Instantiate a collection and keep it as a member within the BankAccount class. This is like saying "BankAccount has a collection of Adjustments."
Inherit from collection. This is like saying "BankAccount is a collection of Adjustments."
I think both solutions are intuitive, and, of course, both offer some advantages and disadvantages. For example, instantiating allows the class (in languages that allow only a single base class) to inherit from another class, while inheriting from collection makes it easy to control the Add, Remove, and other methods without having to write surrogate methods to 'wrap' those.
So, in such situations, which is a better approach?

To me, a bank account has a collection of adjustments. A bank account is not a collection of adjustments, because it "is" much more than that: it also "is" a name and a type, etc.
So, in your case (and similar cases), I suggest you aggregate a collection inside your class.
In case of doubt, avoid inheritance. :-)
I can argument this further. In order to use inheritance properly, the subclass must satisfy Liskov's substitution principle; this means that, in your case, BankAccount should be a valid type anywhere a Collection is expected. I don't think that's the case, because a Collection probably exposes methods such as Add() and Remove(), whereas you will want to exert some control over adding and removing adjustments from your bank account rather than letting people add and remove them freely.

Personally, I would say BankAccount has a collection of Adjustment. It will probably have other properties that aren't exclusively about what has been deposited or withdrawn ( customer, bank account type, etc ).
In terms of design, my BankAccount object would expose a late-loading property of type Adjustments.
In terms of use within the code, I would instantiate the bank account, and if I needed to know what had gone in and out of the account, I would use the exposed property. The BankAccount would be the primary object, responsible for providing the Adjustments related only to the instantiated account.

Instantiate, definitely.
I agree with the other posters about Bank Account being "more" than just a collection of other items.
Or maybe you jut picked an example which really screams out for "instantiate".
Examples:
What happens if your Bank Account needs a second collection of completely different items? (Example: collection of people who can operate on it, like husband and wife, for example, collection of credit cards, paypal accounts or anything else that can "operate" on your bank account).
Depending on the language a collection exposes too much of its info: if another object needs to access Adjustements... say for displaying your movements history on a web page you automatically expose your "collection" for injection, deletion and so on.

I think getting overly caught up in semantics like "is this more is-a or has-a" is a little bit dangerous - at the end of the day, what matters is how well your design solved the problem, how maintainable it is, etc. In fact, personally, a turning point in the way I understand object oriented programming was letting go of "objects as nouns". Objects are, when you get down to it, an abstraction one level up from a function, nothing more or less.
This was a long way to say "has a". :-) Subclassing is complicated, using is easy.

Related

Single Responsibility Principle Core Understanding

In brushing up on the SRP I read this document which I located via Uncle Bob's page on principles of OOD. I find the following passage puzzling and somewhat at odds with the rest of the document:
"If, on the other hand, the application is not changing in ways that cause the the two responsibilities to change at different times, then there is no need to separate them. Indeed, separating them would smell of Needless Complexity. There is a corollary here. An axis of change is only an axis of change if the changes actually occur. It is not wise to apply the SRP, or any other principle for that matter, if there is no symptom."
While I understand the answer to many software development questions is "it depends" principles like the SRP appear to be almost universally beneficial and to be implemented as a matter of course. The SRP itself affords code a high adaptability to future changes in requirements. Isn't the point to separate out responsibilities from the get-go to avoid struggling with highly coupled code and cascading changes later on?
I would really appreciate some clarification on this to make sure my understanding of this core principle is correct. Thanks in advance!
From my humble understanding, in the Modem example that is presented here, it might be possible that responsibilities of the modem (Connection and Data Exchange) will change as one.
You have two possibilities here :
When the protocol changes, it is possible that only the connection part change, or only the data exchange part change. It this case you should have two interfaces, because a change of protocol in the data does not imply a change of protocol in the connection.
When the protocol changes, it will always change the connection part and the data exchange part. In that case, you don't need two interfaces, because everytime you will have to rewrite the connection part, you are sure that the data exchange will change as well. In that case, you have two responsibilities put on the same change Axis (which is the protocol handled by the modem), so you can leave them inside a single interface.
The key to this statement is "not changing in ways that cause the two responsibilities to change at different times". Let's say for the sake of argument you have a PaymentLogger and a Payment class. Every time you create a new PaymentType (CreditCard, Cash, Paypal, etc) you need to update the PaymentLogger to log actions specific to those Payments. Instead of splitting out a PaymentLogger class you could have Payment class have a method called Log which does whatever is specific for itself.
In this case it could be that the act of recording actions should be build into the class itself since creating a new Payment requires also creating a new PaymentLogger. It's a responsibility that should have been part of Payment all along.

open closed principle - refactoring to create base class based on new features

So when original code was written there was only a need for say LabTest class. But now say we have new requirements to add say RadiologyTest, EKGTest etc.
These classes have a lot in common hence it makes sense to have a base class.
But that will mean LabTest class will have to be modified, lets say its interface will remain same as before, in other words consumers of LabTest class will not need change.
Is this violation of open closed principle principle? (LabTest is being modified).
I think you can look at it from two perspectives: existing requirements and new requirements.
If the existing requirements didn't cover the need for these kinds of changes then I'd say, based on those requirements, LabTest did not violate OCP.
With the new requirements, you need to add functionality that does not fit with the LabTest implementation. Adding it to OCP would violate SRP. The requirements now create a new change vector that will force you to refactor LabTest to keep it OCP. If you fail to refactor LabTest it will violate SRP and OCP. When you refactor, keep in mind the new change vector in any classes you create or modify.
These classes have a lot in common hence it makes sense to have a base class.
I think you may be violating SRP. After all, if each class does one task, how can two or more be so similar? If there's a task they both do identically, then that is a separate task and should be done by another class.
So I would say, first refactor LabTest into it's constituent parts (hope you've got unit tests!). Then when you come to write RadiologyTest, EKGTest they can reuse the parts that make sense for them. This is also known as composition over inheritance.
But whatever you do, do use interfaces to these classes in the client. Don't force those who follow to use your base classes to add extensions.
I may get burnt for this answer, but going on a limb anyways.
In my opinion(IMO), OCP cannot be followed in the purist sense like other principles such as SRP, DIP or ISP.
If requirements change in such a way that you have to change the responsibility of a class to be true to their representation of the domain model, then we have to change that class.
IMO, OCP stops us from re-factoring code to follow the evolution of the system.
Please correct me if I am wrong.
Update:
After further research, this is what I am thinking:
Lets say, I have automated test both on unit level and integration level, then IMO we should redesign the complete system to fit the new model, OCP is out the door here.
IMO, the goal of a system evolution is always to avoid hacks(not changing LabTest class and the corresponding DB table so as to not break old code[not violate OCP], and using LabTest to store EKGTest's common data and using LabTest inside of EKGTest or EKGTest inheriting from LabTest will be a hack, IMO) will be and to make the system represent its model as accurate as possible.
I think the Open-Closed Principle, (as outlined by Uncle Bob anyway, vs Bertrand Meyers) isn't about never modifying classes (if software was never going to change it might as well be hardware).
& in your own case, I don't think you're violating OCP as you've mentioned all of your uses of your class are depending on the abstraction of LabTest rather than the implementation of RadiologyTest.
From Uncle Bob's introductory paper, he has an example of a DrawAllShapes class that if designed to OCP, shouldn't need to change each time a new subclass of Shape is added to the system. Regarding to what level you apply it, Uncle Bob says that —
It should be clear that no significant program can be 100% closed. For
example, consider what would happen to the DrawAllShapes function from
Listing 2 if we decided that all Circles should be drawn before any
Squares. The DrawAllShapes function is not closed against a change
like this. In general, no matter how “closed” a module is, there will
always be some kind of change against which it is not closed.
Since closure cannot be complete, it must be strategic. That is, the
designer must choose the kinds of changes against which to close his
design.
I wouldn't read "closed for modification" as "don't refactor", more that you should design you classes in such a way that other classes can't make modifications which will affect you — e.g. applying the basic OO stuff — encapsulation via getters/setters & private member variables.

How to identify what is not a Class?

I know the rule of thumb is that a noun used by the user is potentially a class. Similarly, a verb may be made into an action class e.g. predicate
Given a description from the user, how do you -
identify what is not not to be made into a class
The only real answer is experience. However, some things fairly obviously (to me, anyway) cannot be modelled in your design. For example if the use case says:
"and then the parcel is put on the UPS van"
There is no need to model the van. You can make decisions of this kind by considering the system boundaries - you don't and can't control the van. However,
"we make a request to UPS for pickup"
might well result in a UPSPickup object.
The rules are simple.
Everything is an object.
All objects belong to classes.
In rare (very rare circumstances) you have some specialized class/object confusion.
A "library" of all static methods. This is an implementation choice, and no user can see this.
A Singleton where there can only be one object of the given class. This does happen sometimes.
In an OO language it is not a question of what to be made into a class, but rather 'what class does this data/functionality go into?'
Like other software architecture aspects there are rules, but ultimately it is an art that requires experience. There are lots of books on software design, but a simple reference is Coupling and Cohesion.
Cohesion of a single module/component is the degree to which its responsibilities form a meaningful unit; higher cohesion is better.
Coupling between modules/components is their degree of mutual interdependence; lower coupling is better.

Object Normalization

In the same line as Database Normalization - is there an approach to object normalization, not design pattern, but the same mathematical like approach to normalizing object creation. For example: first normal form: no repeating fields....
here's some links to DB Normalization:
http://en.wikipedia.org/wiki/Database_normalization
http://databases.about.com/od/specificproducts/a/normalization.htm
Would this make object creation and self-documentation better?
Here's a link to a book about class normalization (guess we're really talking about classes)
http://www.agiledata.org/essays/classNormalization.html
Normalization has a mathematical foundation in predicate logic, and a clear and specific goal that the same piece of information never be represented twice in a single model; the purpose of this goal is to eliminate the possibility of inconsistent information in a data model. It can be shown via mathematical proof that if a data model has certain specific properties (that it passes tests for 1st Normal Form (1NF), 2NF, 3NF, etc.) that it is free from redundant data representation, i.e. it is Normalized.
Object orientation has no such underlying mathematical basis, and indeed, no clear and specific goal. It is simply a design idea for introducing more abstraction. The DRY principle, Command-Query Separation, Liskov Substitution Principle, Open-Closed Principle, Tell-Don't-Ask, Dependency Inversion Principle, and other heuristics for improving quality of code (many of which apply to code in general, not just object oriented programs) are not absolute in nature; they are guidelines that programmers have found useful in improving understandability, maintainability, and testability of their code.
With a relational data model, you can say with absolute certainty whether it is "normalized" or not, because it must pass ALL the tests for normal form, and they are quite specific. With an object model, on the other hand, because the goal of "understandable, maintainable, testable, etc" is rather vague, you cannot say with any certainty whether you have met that goal. With many of the design heuristics, you cannot even say for sure whether you have followed them. Have you followed the DRY principle if you're applying patterns to your design? Surely repeated use of a pattern isn't DRY? Furthermore, some of these heuristics or principles aren't always even necessarily good advice all the time. I do try to follow Command-Query Separation, but such useful things as a Stack or a Queue violate that concept in order to give us a rather elegant and useful result.
I guess the Single Responsible Principle is at least related to this. Or at least, violation of the SRP is similar to a lack of normalization in some ways.
(It's possible I'm talking rubbish. I'm pretty tired.)
Interesting.
You may also be interested in looking at the Law of Demeter.
Another thing you may be interested in is c2's FearOfAddingClasses, as, arguably, the same reasoning that lead programmers to denormalise databases also leads to god classes and other code smells. For both OO and DB normalisation, we want to decompose everything. For databases this means more tables, for OO, more classes.
Now, it is worth bearing in mind the object relational impedance mismatch, that is, probably not everything will translate cleanly.
Object relational models or 'persistence layers', usually have 1-to-1 mappings between object attributes and database fields. So, can we normalise? Say we have department object with employee1, employee2 ... etc. attributes. Obviously that should be replaced with a list of employees. So we can say 1NF works.
With that in mind, let's go straight for the kill and look at 6NF database design, a good example is Anchor Modeling, (ignore the naming convention). Anchor Modeling/6NF provides highly decomposed and flexible database schemas; how does this translate to OO 'normalisation'?
Anchor Modeling has these kinds of relationships:
Anchors - unique object IDs.
Attributes, which translate to object attributes: (Anchor, value, metadata).
Ties - relationships between two or more objects (themselves anchors): (Anchor, Anchor... , metadata)
Knots, attributed Ties.
Attribute metadata can be anything - who changed an attribute, when, why, etc.
The OO translation of this is looks extremely flexible:
Anchors suggest attribute-less placeholders, like a proxy which knows how to deal with the attribute composition.
Attributes suggest classes representing attributes and what they belong to. This suggests applying reuse to how attributes are looked up and dealt with, e.g automatic constraint checking, etc. From this we have a basis to generically implement the GOF-style Structural patterns.
Ties and Knots suggest classes representing relationships between objects. A basis for generic implementation of the Behavioural design patterns?
Interesting and desirable properties of Anchor Modeling that also translate across are:
All this requires replacing inheritance with composition (good) in the exposed objects.
Attribute have owners rather than owners having attributes. Although this make attribute lookup more complex, it neatly solves certain aliasing problems, as there can only ever be one owner.
No need for NULL. This translates to clearer NULL handling. Empty-case attribute classes could provide methods for handling the lack of a particular attribute, instead of performing NULL-checking everywhere.
Attribute metadata. Attribute-level full historisation and blaming: 'play' objects back in time, see what changed, when and why, etc. (if required - metadata is entirely optional)
There would probably be a lot of very simple classes (which is good), and a very declarative programming style (also good).
Thanks for such a thought provoking question, I hope this is useful for you.
Perhaps you're taking this from a relational point-of-view, but I would posit that the principles of interfaces and inheritance correspond to normalization in the world of OOP.
For example, a Person abstract class containing FirstName, LastName, Gender and BirthDate can be used by classes such as Employee, User, Member etc. as a valid base class, without a need to repeat the definitions of those attributes in such subclasses.
The principle of DRY, (a core principle of Andy Hunt and Dave Thomas's book The Pragmatic Programmer), and the constant emphasis of object-oriented programming on re-use, also correspond to the efficiencies offered by Normalization in relational databases.
At first glance, I'd say that the objectives of Code Refactoring are similar in an abstract way to the objectives of normalization. But that's pretty abstract.
Update: I almost wrote earlier that "we need to get Jon Skeet in on this one." I posted my answer and who beat me? You guessed it...
Object Role Modeling (not to be confused with Object Relational Mapping) is the closest thing I know of to normalization for objects. It doesn't have as mathematical a foundation as normalization, but it's a start.
In a fairly ad-hoc and untutored fashion, that will probably cause purists to scoff, and perhaps rightly, I think of a database table as being a set of objects of a particular type, and vice versa. Then I take my thoughts from there. Viewed this way, it doesn't seem to me like there's anything particularly special you have to do to use normal form in your everyday programming. Each object's identity will do for starters as its primary key, and references (pointers, etc.) will do by way of foreign keys. Then just follow the same rules.
(My objects usually end up in 3NF, or some approximation thereof. I treat this all more as guidelines, and, like I said, "untutored".)
If the rules are followed properly, each bit of information then ends up in one place, the interrelationships are clear, and everything is structured such that introducing inconsistencies takes some work. One could say that this approach produces good results on this basis, and I would agree.
One downside is that the end result can feel a bit like a tangle of spaghetti, particularly after some time away, and it's hard to shake the constant lingering sensation, even though it's usually false, that surely a few of all these links could be removed...
Object oriented design is rational but it does not have the same mathematically well-defined basis as the Relational Model. There is nothing exactly equivalent to the well-defined normal forms of database design.
Whether this is a strength or a weakness of Object oriented design is a matter of interpretation.
I second the SRP. The Open Closed Principle applies as well to "normalization" although I might stretch the meaning of the word, in that it should be possible to extend the system by adding new implementations, without modifying the existing code. objectmentor about OCP
good question, sorry i can't answer in depth
I've been working on object normalization off and on for over 20 years. It's deep and complicated and beautiful, and is the subject of my second planned book, Object Mechanics II. ONF = Object Normal Form, you heard it here first! ;-)
since potentially patentable technology lurks within, I am not at liberty to say more, except that normalizing the data is the really easy part ;-)
ADDENDUM: change of plans - see https://softwareengineering.stackexchange.com/questions/84598/object-oriented-normalization

How to avoid Anemic Domain Models and maintain Separation of Concerns?

It seems that the decision to make your objects fully cognizant of their roles within the system, and still avoid having too many dependencies within the domain model on the database, and service layers?
For example: Say that I've got an entity with a revision history, and several "lookup tables" that the data references, your entity object should have methods to get the details from some of the lookup tables, whether by providing access to the lookup table rows, or by delegating methods down to them, but in order to do so it depends on the database layer to read the data from those rows. Also, when the entity is saved, It needs to know not only how to save itself, but also to save entries into the revision history. Is it necessary to pass references to dozens of different data layer objects and service objects to the model object? This seems like it makes the logic far more complex to understand than just passing back and forth thin models to service layer objects, but I've heard many "wise men" recommending this sort of structure.
Really really good question. I have spent quite a bit of time thinking about such topics.
You demonstrate great insight by noting the tension between an expressive domain model and separation of concerns. This is much like the tension in the question I asked about Tell Don't Ask and Single Responsibility Principle.
Here is my view on the topic.
A domain model is anemic because it contains no domain logic. Other objects get and set data using an anemic domain object. What you describe doesn't sound like domain logic to me. It might be, but generally, look-up tables and other technical language is most likely terms that mean something to us but not necessarily anything to the customers. If this is incorrect, please clarify.
Anyway, the construction and persistence of domain objects shouldn't be contained in the domain objects themselves because that isn't domain logic.
So to answer the question, no, you shouldn't inject a whole bunch of non-domain objects/concepts like lookup tables and other infrastructure details. This is a leak of one concern into another. The Factory and Repository patterns from Domain-Driven Design are best suited to keep these concerns apart from the domain model itself.
But note that if you don't have any domain logic, then you will end up with anemic domain objects, i.e. bags of brainless getters and setters, which is how some shops claim to do SOA / service layers.
So how do you get the best of both worlds? How do you focus your domain objects only domain logic, while keeping UI, construction, persistence, etc. out of the way? I recommend you use a technique like Double Dispatch, or some form of restricted method access.
Here's an example of Double Dispatch. Say you have this line of code:
entity.saveIn(repository);
In your question, saveIn() would have all sorts of knowledge about the data layer. Using Double Dispatch, saveIn() does this:
repository.saveEntity(this.foo, this.bar, this.baz);
And the saveEntity() method of the repository has all of the knowledge of how to save in the data layer, as it should.
In addition to this setup, you could have:
repository.save(entity);
which just calls
entity.saveIn(this);
I re-read this and I notice that the entity is still thin because it is simply dispatching its persistence to the repository. But in this case, the entity is supposed to be thin because you didn't describe any other domain logic. In this situation, you could say "screw Double Dispatch, give me accessors."
And yeah, you could, but IMO it exposes too much of how your entity is implemented, and those accessors are distractions from domain logic. I think the only class that should have gets and sets is a class whose name ends in "Accessor".
I'll wrap this up soon. Personally, I don't write my entities with saveIn() methods, because I think even just having a saveIn() method tends to litter the domain object with distractions. I use either the friend class pattern, package-private access, or possibly the Builder pattern.
OK, I'm done. As I said, I've obsessed on this topic quite a bit.
"thin models to service layer objects" is what you do when you really want to write the service layer.
ORM is what you do when you don't want to write the service layer.
When you work with an ORM, you are still aware of the fact that navigation may involve a query, but you don't dwell on it.
Lookup tables can be a relational crutch that gets used when there isn't a very complete object model. Instead of things referencing things, you have codes, which must be looked up. In many cases, the codes devolve to little more than a static pool of strings with database keys. And the relevant methods wind up in odd places in the software.
However, if there is a more complete object model, we have first-class things instead of these degenerate lookup values.
For example, I've got some business transactions which have one of n different "rate plans" -- a kind of pricing model. Right now, the legacy relational database has the rate plan as a lookup table with a code, some pricing numbers, and (sometimes) a description.
[Everyone knows the codes -- the codes are sacred. No one is sure what the proper descriptions should be. But they know the codes.]
But really, a "rate plan" is an object that is associated with a contract; the rate plan has the method that computes the final price. When an app asks the contract for a price, the contract delegates some of the pricing work to the associated rate plan object.
There may have been some database query going on to lookup the rate plan when producing a contract price, but that's incidental to the delegation of responsibility between the two classes.
I aggree with DeadBeef - therein lies the tension. I don't really see though how a domain model is 'anemic' simply because it doesn't save itself.
There has to be much more to it. ie. It's anemic because the service is doing all the business rules and not the domain entity.
Service(IRepository) injected
Save(){
DomainEntity.DoSomething();
Repository.Save(DomainEntity);
}
'Do Something' is the business logic of the domain entity.
**This would be anemic**:
Service(IRepository) injected
Save(){
if(DomainEntity.IsSomething)
DomainEntity.SetItProperty();
Repository.Save(DomainEntity);
}
See the inherit difference ? I do :)
Try the "repository pattern" and "Domain driven design". DDD suggests to define certain entities as Aggregate-roots of other objects. Each Aggregate is encapsulated. The entities are "persistence ignorant". All the persistence-related code is put in a repository object which manages Data-access for the entity. This way you don't have to mix persistence-related code with your business logic. If you are interested in DDD, check out eric evans book.