Private vs. Public members in practice (how important is encapsulation?) [closed] - language-agnostic

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
One of the biggest advantages of object-oriented programming is encapsulation, and one of the "truths" we've (or, at least, I've) been taught is that members should always be made private and made available via accessor and mutator methods, thus ensuring the ability to verify and validate the changes.
I'm curious, though, how important this really is in practice. In particular, if you've got a more complicated member (such as a collection), it can be very tempting to just make it public rather than make a bunch of methods to get the collection's keys, add/remove items from the collection, etc.
Do you follow the rule in general? Does your answer change depending on whether it's code written for yourself vs. to be used by others? Are there more subtle reasons I'm missing for this obfuscation?

It depends. This is one of those issues that must be decided pragmatically.
Suppose I had a class for representing a point. I could have getters and setters for the X and Y coordinates, or I could just make them both public and allow free read/write access to the data. In my opinion, this is OK because the class is acting like a glorified struct - a data collection with maybe some useful functions attached.
However, there are plenty of circumstances where you do not want to provide full access to your internal data and rely on the methods provided by the class to interact with the object. An example would be an HTTP request and response. In this case it's a bad idea to allow anybody to send anything over the wire - it must be processed and formatted by the class methods. In this case, the class is conceived of as an actual object and not a simple data store.
It really comes down to whether or not verbs (methods) drive the structure or if the data does.

As someone having to maintain several-year-old code worked on by many people in the past, it's very clear to me that if a member attribute is made public, it is eventually abused. I've even heard people disagreeing with the idea of accessors and mutators, as that's still not really living up to the purpose of encapsulation, which is "hiding the inner workings of a class". It's obviously a controversial topic, but my opinion would be "make every member variable private, think primarily about what the class has got to do (methods) rather than how you're going to let people change internal variables".

Yes, encapsulation matters. Exposing the underlying implementation does (at least) two things wrong:
Mixes up responsibilities. Callers shouldn't need or want to understand the underlying implementation. They should just want the class to do its job. By exposing the underlying implementation, you're class isn't doing its job. Instead, it's just pushing the responsibility onto the caller.
Ties you to the underlying implementation. Once you expose the underlying implementation, you're tied to it. If you tell callers, e.g., there's a collection underneath, you cannot easily swap the collection for a new implementation.
These (and other) problems apply regardless of whether you give direct access to the underlying implementation or just duplicate all the underlying methods. You should be exposing the necessary implementation, and nothing more. Keeping the implementation private makes the overall system more maintainable.

I prefer to keep members private as long as possible and only access em via getters, even from within the very same class. I also try to avoid setters as a first draft to promote value style objects as long as it is possible. Working with dependency injection a lot you often have setters but no getters, as clients should be able to configure the object but (others) not get to know what's acutally configured as this is an implementation detail.
Regards,
Ollie

I tend to follow the rule pretty strictly, even when it's just my own code. I really like Properties in C# for that reason. It makes it really easy to control what values it's given, but you can still use them as variables. Or make the set private and the get public, etc.

Basically, information hiding is about code clarity. It's designed to make it easier for someone else to extend your code, and prevent them from accidentally creating bugs when they work with the internal data of your classes. It's based on the principle that nobody ever reads comments, especially ones with instructions in them.
Example: I'm writing code that updates a variable, and I need to make absolutely sure that the Gui changes to reflect the change, the easiest way is to add an accessor method (aka a "Setter"), which is called instead of updating data is updated.
If I make that data public, and something changes the variable without going through the Setter method (and this happens every swear-word time), then someone will need to spend an hour debugging to find out why the updates aren't being displayed. The same applies, to a lesser extent, to "Getting" data. I could put a comment in the header file, but odds are that no-one will read it till something goes terribly, terribly wrong. Enforcing it with private means that the mistake can't be made, because it'll show up as an easily located compile-time bug, rather than a run-time bug.
From experience, the only times you'd want to make a member variable public, and leave out Getter and Setter methods, is if you want to make it absolutely clear that changing it will have no side effects; especially if the data structure is simple, like a class that simply holds two variables as a pair.
This should be a fairly rare occurence, as normally you'd want side effects, and if the data structure you're creating is so simple that you don't (e.g a pairing), there will already be a more efficiently written one available in a Standard Library.
With that said, for most small programs that are one-use no-extension, like the ones you get at university, it's more "good practice" than anything, because you'll remember over the course of writing them, and then you'll hand them in and never touch the code again. Also, if you're writing a data structure as a way of finding out about how they store data rather than as release code, then there's a good argument that Getters and Setters will not help, and will get in the way of the learning experience.
It's only when you get to the workplace or a large project, where the probability is that your code will be called to by objects and structures written by different people, that it becomes vital to make these "reminders" strong. Whether or not it's a single man project is surprisingly irrelevant, for the simple reason that "you six weeks from now" is as different person as a co-worker. And "me six weeks ago" often turns out to be lazy.
A final point is that some people are pretty zealous about information hiding, and will get annoyed if your data is unnecessarily public. It's best to humour them.

C# Properties 'simulate' public fields. Looks pretty cool and the syntax really speeds up creating those get/set methods

Keep in mind the semantics of invoking methods on an object. A method invocation is a very high level abstraction that can be implemented my the compiler or the run time system in a variety of different ways.
If the object who's method you are invoking exists in the same process/ memory map then a method could well be optimized by a compiler or VM to directly access the data member. On the other hand if the object lives on another node in a distributed system then there is no way that you can directly access it's internal data members, but you can still invoke its methods my sending it a message.
By coding to interfaces you can write code that doesn't care where the target object exists or how it's methods are invoked or even if it's written in the same language.
In your example of an object that implements all the methods of a collection, then surely that object actually is a collection. so maybe this would be a case where inheritance would be better than encapsulation.

It's all about controlling what people can do with what you give them. The more controlling you are the more assumptions you can make.
Also, theorectically you can change the underlying implementation or something, but since for the most part it's:
private Foo foo;
public Foo getFoo() {}
public void setFoo(Foo foo) {}
It's a little hard to justify.

Encapsulation is important when at least one of these holds:
Anyone but you is going to use your class (or they'll break your invariants because they don't read the documentation).
Anyone who doesn't read the documentation is going to use your class (or they'll break your carefully documented invariants). Note that this category includes you-two-years-from-now.
At some point in the future someone is going to inherit from your class (because maybe an extra action needs to be taken when the value of a field changes, so there has to be a setter).
If it is just for me, and used in few places, and I'm not going to inherit from it, and changing fields will not invalidate any invariants that the class assumes, only then I will occasionally make a field public.

My tendency is to try to make everything private if possible. This keeps object boundaries as clearly defined as possible and keeps the objects as decoupled as possible. I like this because when I have to rewrite an object that I botched the first (second, fifth?) time, it keeps the damage contained to a smaller number of objects.
If you couple the objects tightly enough, it may be more straightforward just to combine them into one object. If you relax the coupling constraints enough you're back to structured programming.
It may be that if you find that a bunch of your objects are just accessor functions, you should rethink your object divisions. If you're not doing any actions on that data it may belong as a part of another object.
Of course, if you're writing a something like a library you want as clear and sharp of an interface as possible so others can program against it.

Fit the tool to the job... recently I saw some code like this in my current codebase:
private static class SomeSmallDataStructure {
public int someField;
public String someOtherField;
}
And then this class was used internally for easily passing around multiple data values. It doesn't always make sense, but if you have just DATA, with no methods, and you aren't exposing it to clients, I find it a quite useful pattern.
The most recent use I had of this was a JSP page where I had a table of data being displayed, defined at the top declaratively. So, initially it was in multiple arrays, one array per data field... this ended in the code being rather difficult to wade through with fields not being next to eachother in definition that would be displayed together... so I created a simple class like above which would pull it together... the result was REALLY readable code, a lot more so than before.
Moral... sometimes you should consider "accepted bad" alternatives if they may make the code simpler and easier to read, as long as you think it through and consider the consequences... don't blindly accept EVERYTHING you hear.
That said... public getters and setters is pretty much equivalent to public fields... at least essentially (there is a tad more flexibility, but it is still a bad pattern to apply to EVERY field you have).
Even the java standard libraries has some cases of public fields.

When I make objects meaningful they are easier to use and easier to maintain.
For example: Person.Hand.Grab(howquick, howmuch);
The trick is not to think of members as simple values but objects in themselves.

I would argue that this question does mix-up the concept of encapsulation with 'information hiding'
(this is not a critic, since it does seem to match a common interpretation of the notion of 'encapsulation')
However for me, 'encapsulation' is either:
the process of regrouping several items into a container
the container itself regrouping the items
Suppose you are designing a tax payer system. For each tax payer, you could encapsulate the notion of child into
a list of children representing the children
a map of to takes into account children from different parents
an object Children (not Child) which would provide the needed information (like total number of children)
Here you have three different kinds of encapsulations, 2 represented by low-level container (list or map), one represented by an object.
By making those decisions, you do not
make that encapsulation public or protected or private: that choice of 'information hiding' is still to be made
make a complete abstraction (you need to refine the attributes of object Children and you may decide to create an object Child, which would keep only the relevant informations from the point of view of a tax payer system)
Abstraction is the process of choosing which attributes of the object are relevant to your system, and which must be completely ignored.
So my point is:
That question may been titled:
Private vs. Public members in practice (how important is information hiding?)
Just my 2 cents, though. I perfectly respect that one may consider encapsulation as a process including 'information hiding' decision.
However, I always try to differentiate 'abstraction' - 'encapsulation' - 'information hiding or visibility'.

#VonC
You might find the International Organisation for Standardization's, "Reference Model of Open Distributed Processing," an interesting read. It defines: "Encapsulation: the property that the information contained in an object is accessible only through interactions at the interfaces supported by the object."
I tried to make a case for information hiding's being a critical part of this definition here:
http://www.edmundkirwan.com/encap/s2.html
Regards,
Ed.

I find lots of getters and setters to be a code smell that the structure of the program is not designed well. You should look at the code that uses those getters and setters, and look for functionality that really should be part of the class. In most cases, the fields of a class should be private implementation details and only the methods of that class may manipulate them.
Having both getters and setters is equal to the field being public (when the getters and setters are trivial/generated automatically). Sometimes it might be better to just declare the fields public, so that the code will be more simple, unless you need polymorphism or a framework requires get/set methods (and you can't change the framework).
But there are also cases where having getters and setters is a good pattern. One example:
When I create the GUI of an application, I try to keep the behaviour of the GUI in one class (FooModel) so that it can be unit tested easily, and have the visualization of the GUI in another class (FooView) which can be tested only manually. The view and model are joined with simple glue code; when the user changes the value of field x, the view calls setX(String) on the model, which in turn may raise an event that some other part of the model has changed, and the view will get the updated values from the model with getters.
In one project, there is a GUI model which has 15 getters and setters, of which only 3 get methods are trivial (such that the IDE could generate them). All the others contain some functionality or non-trivial expressions, such as the following:
public boolean isEmployeeStatusEnabled() {
return pinCodeValidation.equals(PinCodeValidation.VALID);
}
public EmployeeStatus getEmployeeStatus() {
Employee employee;
if (isEmployeeStatusEnabled()
&& (employee = getSelectedEmployee()) != null) {
return employee.getStatus();
}
return null;
}
public void setEmployeeStatus(EmployeeStatus status) {
getSelectedEmployee().changeStatusTo(status, getPinCode());
fireComponentStateChanged();
}

In practice I always follow only one rule, the "no size fits all" rule.
Encapsulation and its importance is a product of your project. What object will be accessing your interface, how will they be using it, will it matter if they have unneeded access rights to members? those questions and the likes of them you need to ask yourself when working on each project implementation.

I base my decision on the Code's depth within a module.
If I'm writting code that is internal to a module, and does not interface with the outside world I don't encapsulate things with private as much because it affects my programmer performance (how fast I can write and rewrite my code).
But for the objects that server as the module's interface with user code, then I adhere to strict privacy patterns.

Certainly it makes a difference whether your writing internal code or code to be used by someone else (or even by yourself, but as a contained unit.) Any code that is going to be used externally should have a well defined/documented interface that you'll want to change as little as possible.
For internal code, depending on the difficulty, you may find it's less work to do things the simple way now, and pay a little penalty later. Of course Murphy's law will ensure that the short term gain will be erased many times over in having to make wide-ranging changes later on where you needed to change a class' internals that you failed to encapsulate.

Specifically to your example of using a collection that you would return, it seems possible that the implementation of such a collection might change (unlike simpler member variables) making the utility of encapsulation higher.
That being said, I kinda like Python's way of dealing with it. Member variables are public by default. If you want to hide them or add validation there are techniques provided, but those are considered the special cases.

I follow the rules on this almost all the time. There are four scenarios for me - basically, the rule itself and several exceptions (all Java-influenced):
Usable by anything outside of the current class, accessed via getters/setters
Internal-to-class usage typically preceded by 'this' to make it clear that it's not a method parameter
Something meant to stay extremely small, like a transport object - basically a straight shot of attributes; all public
Needed to be non-private for extension of some sort

There's a practical concern here that isn't being addressed by most of the existing answers. Encapsulation and the exposure of clean, safe interfaces to outside code is always great, but it's much more important when the code you're writing is intended to be consumed by a spatially- and/or temporally-large "user" base. What I mean is that if you plan on somebody (even you) maintaining the code well into the future, or if you're writing a module that will interface with code from more than a handful of other developers, you need to think much more carefully than if you're writing code that's either one-off or wholly written by you.
Honestly, I know what wretched software engineering practice this is, but I'll oftentimes make everything public at first, which makes things marginally faster to remember and type, then add encapsulation as it makes sense. Refactoring tools in most popular IDEs these days makes which approach you use (adding encapsulation vs. taking it away) much less relevant than it used to be.

Related

Interfaces vs Public Class Members

I've noticed that some programmers like to make interfaces for just about all their classes. I like interfaces for certain things (such as checking if an object supports a certain behavior and then having an interface for that behavior) but overuse of interfaces can sometimes bloat the code. When I declare methods or properties as public I'd expect people to just use my concrete classes and I don't really understand the need to create interfaces on top of that.
I'd like to hear your take on interfaces. When do you use them and for what purposes?
Thank you.
Applying any kind of design pattern or idea without thinking, just because somebody told you it's good practice, is a bad idea.
That ofcourse includes creating a separate interface for each and every class you create. You should at least be able to give a good reason for every design decision, and "because Joe says it's good practice" is not a good enough reason.
Interfaces are good for decoupling the interface of some unit of code from its implementation. A reason to create an interface is because you foresee that there might be multiple implementations of it in the future. It can also help with unit testing; you can make a mock implementation of the services that the unit you want to test depends on, and plug the mock implementations in instead of "the real thing" for testing.
Interfaces are a powerful tool for abstraction. With them, you can more freely substitute (for example) test classes and thereby decouple your code. They are also a way to narrow the scope of your code; you probably don't need the full feature set of a given class in a particular place - exactly what features do you need? That's a client-focused way of thinking about interfaces.
Unit tests.
With an interface describing all class methods and properties it is within the reach of a click to create a mock-up class to simulate behavior that is not within the scope of said test.
It's all about expecting and preparing for change.
One approach that some use (and I'm not necessarily advocating it)
is to create an IThing and a ThingFactory.
All code will reference IThing (instead of ConcreteThing).
All object creation can be done via the Factory Method.
ThingFactory.CreateThing(some params).
So, today we only have AmericanConcreteThing. And the possibility is that we may never need another. However, if experience has taught me anything, it is that we will ALWAYS need another.
You may not need EuropeanThing, but TexasAmericanThing is a distinct possibility.
So, In order to minimize the impact on my code, I can change the creational line to:
ThingFactory.CreateThing( Account )
and Create my class TexasAmericanThing : IThing.
Other than building the class, the only change is to the ThingFactory, which will require a change from
public static IThing CreateThing(Account a)
{
return new AmericanThing();
}
to
public static IThing CreateThing(Account a)
{
if (a.State == State.TEXAS) return new TexasAmericanThing();
return new AmericanThing();
}
I've seen plenty of mindless Interfaces myself. However, when used intelligently, they can save the day. You should use Interfaces for decoupling two components or two layers of an application. This can enable you to easily plug-in varying implementations of the interface without affecting the client, or simply insulate the client from constant changes to the implementation, as long as you stay true to the contract of the interface. This can make the code more maintainable in the long term and can save the effort of refactoring later.
However, overly aggressive decoupling can make for non-intuitive code. It's overuse can lead to nuisance. You should carefully identify the cohesive parts of your application and the boundaries between them and use interfaces there. Another benefit of using Interfaces between such parts is that they can be developed in parallel and tested independently using mock implementations of the interfaces they use.
OTOH, having client code access public member methods directly is perfectly okay if you really don't foresee any changes to the class that might also necessitate changes in the client. In any case, however, having public member fields I think is not good. This is extremely tight coupling! You are basically exposing the architecture of your class and making the client code dependent on it. Tomorrow if you realize that another data structure for a particular field will perform better, you can't change it without also changing the client code.
I primarily use interfaces for IoC to enable unit testing.
On the one hand, this could be interpreted as premature generalization. On the other hand, using interfaces as a rule helps you write code that is more easily composable and hence testable. I think the latter wins out in many cases.
I like interfaces:
* to define a contract between parts/modules/subsystems or 3rd party systems
* when there are exchangeable states or algorithms (state/strategy)

Subclasses causing unexpected behavior in superclasses — OO design question

Although I'm coding in ObjC, This question is intentionally language-agnostic - it should apply to most OO languages
Let's say I have an "Collection" class, and I want to create a "FilteredCollection" that inherits from "Collection". Filters will be set up at object-creation time, and from them on, the class will behave like a "Collection" with the filters applied to its contents.
I do things the obvious way and subclass Collection. I override all the accessors, and think I've done a pretty neat job - my FilteredCollection looks like it should behave just like a Collection, but with objects that are 'in' it that correspond to my filters being filtered out to users. I think I can happily create FilteredCollections and pass them around my program as Collections.
But I come to testing and - oh no - it's not working. Delving into the debugger, I find that it's because the Collection implementation of some methods is calling the overridden FilteredCollection methods (say, for example, there's a "count" method that Collection relies upon when iterating its objects, but now it's getting the filtered count, because I overrode the count method to give the correct external behaviour).
What's wrong here? Why does it feel like some important principles are being violated despite the fact that it also feels like OO 'should' work this way? What's a general solution to this issue? Is there one?
I know, by the way, that a good 'solution' to this problem in particular would be to filter the objects before I put them into the collection, and not have to change Collection at all, but I'm asking a more general question than that - this is just an example. The more general issue is methods in an opaque superclass that rely on the behaviour of other methods that could be changed by subclasses, and what to do in the case that you want to subclass an object to change behaviour like this.
The Collection that you inherit from has a certain contract. Users of the class (and that includes the class itself, because it can call its own methods) assume that subclasses obey the contract. If you're lucky, the contract is specified clearly and unambiguously in its documentation...
For example, the contract could say: "if I add an element x, then iterate over the collection, I should get x back". It seems that your FilteredCollection implementation breaks that contract.
There is another problem here: Collection should be an interface, not a concrete implementation. An implementation (e.g. TreeSet) should implement that interface, and of course also obey its contract.
In this case, I think the correct design would be not to inherit from Collection, but rather create FilteredCollection as a "wrapper" around it. Probably FilteredCollection should not implement the Collection interface, because it does not obey the usual contract for collections.
Rather than sublcassing Collection to implement FilteredCollection, try implementing FilteredCollection as a separate class that implements iCollection and delegates to an existing collection. This is similar to the Decorator pattern from the Gang of Four.
Partial example:
class FilteredCollection implements ICollection
{
private ICollection baseCollection;
public FilteredCollection(ICollection baseCollection)
{
this.baseCollection = baseCollection;
}
public GetItems()
{
return Filter(baseCollection.GetItems());
}
private Filter(...)
{
//do filter here
}
}
Implementing FilteredCollection as a decorator for ICollection has the added benefit that you can filter anything that implements ICollection, not just the one class you subclassed.
For added goodness, you can use the Command pattern to inject a specific implementation of Filter() into the FilteredCollection at runtime, eliminating the need to write a different FilteredCollection implementation for every filter you want to apply.
(Note whilst I'll use your example I'll try to concentrate on the concept rather then tell you what's wrong with your specific example).
Black Box Inheritance?
What you're crashing into is the myth of "Black box inheritance". Its often not actually possible to separate completely implementations that allow inheritance from implementations that use that inheritance. I know this flys in the face of how inheritance is often taught but there it is.
To take your example, its quite reasonable for you to want the consumers of the collection contract to see a Count which matches the number items they can get out of your collection. Its also quite reasonable for code in the inherited base class to access its Count property and get what it expects. Something has to give.
Who is Responsible?
Answer: The base class. To achieve both the goals above the base class needs to handle things differently. Why is this the reponsibility of the base class? Because it allows itself to be inherited from and allowed the member implementation to be overriden. Now it may be in some languages that facilitate an OO design that you aren't given a choice. However that just makes this problem harder to deal with but it still needs be dealt with.
In the example, the base collection class should have its own internal means of determining its actual count in the knowledge that a sub-class may override the existing implementation of Count. Its own implementation of the public and overridable Count property should not impact on the internal operation of the base class but just be a means to acheive the external contract it is implementing.
Of course this means the implementation of the base class isn't as crisp and clean as we would like. That's what I mean by the black box inheritance being a myth, there is some implementation cost just to allow inheritance.
The Bottom Line...
is an inheritable class needs to be coded defensively so that it doesn't rely on assumed operation of overridable members. OR it needs to be very clear in some form of documentation exactly what behaviour is expected from overriden implementations of members (this is common in classes that define abstract members).
Your FilteredCollection feels wrong. Usually, when you have a collection and you add a new element into it, you expect that it's count increases by one, and the new element is added to the container.
Your FilteredCollection does not work like this - if you add an item that is filtered, the count of the container might not change. I think this is where your design goes wrong.
If that behaviour is intended, then the contract for count makes it unsuitable for the purpose your member functions are trying to use it for.
I think that the real issue is a misunderstanding of how object-oriented languages are supposed to work. I'm guessing that you have code that looks something like this:
Collection myCollection = myFilteredCollection;
And expect to invoke the methods implemented by the Collection class. Correct?
In a C++ program, this might work, provided that the methods on Collection are not defined as virtual methods. However, this is an artifact of the design goals of C++.
In just about every other object-oriented language, all methods are dispatched dynamically: they use the type of the actual object, not the type of the variable.
If that's not what you're wondering, then read up on the Liskov Substitution Principle, and ask yourself whether you're breaking it. Lots of class hierarchies do.
What you described is a quirk of polymorphism. Since you can address an instance of a subclass as an instance of the parent class, you may not know what kind of implementation lies underneath the covers.
I think your solution is pretty simple:
You stated that you don't modify the collection, you only apply a filter to it when people fetch from it. Therefore you should not override the count method. All of those elements are in the collection therefore don't lie to the caller.
You want the base .count method to behave normally, but you still want the count so you should implement a getFilteredCount method which returns the amount of elements post filtering.
Subclassing is all about the 'Kind of' relationship. What you're doing is not out of the norm but not the most standard use case either. You're applying a filter to a collection, so you can claim that a 'FilteredCollection' is a 'kind of' collection, but in reality you're not actually modifying the collection; you're just wrapping it with a layer that simplifies filtering. In any case, this should work. The only downside is that you have to remember to call 'getFilteredCount' instead of .getCount
The example falls into "Doctor, it hurts when I do this" category. Yes, subclasses can break superclasses in various ways. No, there is no simple waterproof solution to prevent that.
You can seal your superclass (make everything final) if your language supports this but then you lose flexibility. This is the bad kind of defensive programming (the good relies on robust code, the bad relies on strong restrictions).
The best you can do is to act at human level - make sure that the human that writes the subclass understands the superclass. Tutoring/code review, good documentation, unit tests (in roughly this order of importance) can help achieve this. And of course it doesn't hurt to code the base class defensively.
You could argue that the superclass is not well-designed for subclassing, at least not in the way you want to. When the superclass calls "Count()" or "Next()" or whatever, it doesn't have to let that call be overridden. In c++, it can't be overridden unless it's declared "virtual", but that doesn't apply in all languages - for example, Obj-C is inherently virtual if I remember correctly.
It's even worse - this problem can happen to you even if you don't override methods in the superclass - see Subtyping vs Subclassing. See in particular the OOP problems reference in that article.
It behaves this way because this is how object-oriented programming is supposed to work!
The whole point of OOP is supposed to be that a sub-class can redefine some of its superclasses methods, and then operations done at the superclass level will get the subclass implementation.
Let's make your example a little more concrete. We create a "Collection animal" that contains dog, cat, lion, and basilisk. Then we create a FilteredCollection domesticAnimal that filters out the lion and basilisk. So now if we iterate over domesticAnimal we expect to see only dog and cat. If we ask for a count of the number of members, would we not expect the result to be "2"? It would surely be curious behavior if we asked the object how many members it had and it said "4", and then when we asked it to list them it only listed 2.
Making the overrides work at the superclass level is an important feature of OOP. It allows us to define a function that takes, in your example, a Collection object as a parameter and operates on it, without knowing or caring whether underneath it is really a "pure" Collection or a FilteredCollection. Everything should work either way. If it's a pure Collection it gets the pure Collection functions; if it's a FilteredCollection it gets the FilteredCollection functions.
If the count is also used internally for other purposes -- like deciding where new elements should go, so that you add what is really a fifth element and it mysteriously overwrites #3 -- then you have a problem in the design of the classes. OOP gives you great power over how classes operate, but with great power comes great responsibility. :-) If a function is used for two different purposes, and you override the implementation to satisfy your requirements for purpose #1, it's up to you to make sure that that doesn't break purpose #2.
My first reaction to your post was the mention of overriding "all the accessors." This is something I've seen a lot of: extending a base class then overriding most of the base class methods. This defeats the purpose of inheritance in my opinion. If you need to override most base class functions then it's time to reconsider why you're extending the class. As said before, an interface may be a better solution, since it loosely couples disparate objects. The sub-class should EXTEND the functionality of the base class, not completely rewrite it.
I couldn't help but wonder if you are overriding the base class members then it would seem quite logical that unexpected behavior would occur.
When I first grok'd how inheritance worked I used it a lot. I had these big trees with everything connected one way or another.
What a pain.
For what you want, you should be referencing your object, not extending it.
Also, I'd personally hide any trace of passing a collection from my public API (and, in general, my private API as well). Collections are impossible to make safe. Wrapping a collection (Come on, what's it used for??? You can guess just from the signature, right?) inside a WordCount class or a UsersWithAges class or a AnimalsAndFootCount class can make a lot more sense.
Also having methods like wordCount.getMostUsedWord(), usersWithAges.getUsersOverEighteen() and animalsAndFootCount.getBipeds() method moves repetitive utility functionality scattered throughout your code into your new-fangled business collection where it belongs.

Allen Holub wrote "You should never use get/set functions", is he correct? [duplicate]

This question already has answers here:
Why use getters and setters/accessors?
(37 answers)
Closed 7 years ago.
Allen Holub wrote the following,
You can't have a program without some coupling. Nonetheless, you can minimize coupling considerably by slavishly following OO (object-oriented) precepts (the most important is that the implementation of an object should be completely hidden from the objects that use it). For example, an object's instance variables (member fields that aren't constants), should always be private. Period. No exceptions. Ever. I mean it. (You can occasionally use protected methods effectively, but protected instance variables are an abomination.)
Which sounds reasonable, but he then goes on to say,
You should never use get/set functions for the same reason—they're just overly complicated ways to make a field public (though access functions that return full-blown objects rather than a basic-type value are reasonable in situations where the returned object's class is a key abstraction in the design).
Which, frankly, just sounds insane to me.
I understand the principle of information hiding, but without accessors and mutators you couldn't use Java beans at all. I don't know how you would follow a MVC design without accessors in the model, since the model can not be responsible for rendering the view.
However, I am a younger programmer and I learn more about Object Oriented Design everyday. Perhaps someone with more experience can weigh in on this issue.
Allen Holub's articles for reference
Why Extends Is Evil
Why Getter And Setter Methods Are Evil
Related Questions:
Java: Are Getters and Setters evil?
Is it really that wrong not using setters and getters?
Are get and set functions popular with C++ programmers?
Should you use accessor properties from within the class, or just from outside of the class?
I don't have a problem with Holub telling you that you should generally avoid altering the state of an object but instead resort to integrated methods (execution of behaviors) to achieve this end. As Corletk points out, there is wisdom in thinking long and hard about the highest level of abstraction and not just programming thoughtlessly with getters/setters that just let you do an end-run around encapsulation.
However, I have a great deal of trouble with anyone who tells you that you should "never" use setters or should "never" access primitive types. Indeed, the effort required to maintain this level of purity in all cases can and will end up causing more complexity in your code than using appropriately implemented properties. You just have to have enough sense to know when you are skirting the rules for short-term gain at the expense of long-term pain.
Holub doesn't trust you to know the difference. I think that knowing the difference is what makes you a professional.
Read through that article carefully. Holub is pushing the point that getters and setters are an evil "default antipattern", a bad habit that we slip into when designing a system; because we can.
The thought process should be along the lines; What does this object do? What are its responsibilities? What are its behaviours? What does it know? Thinking long and hard on these questions leads you naturally towards designing classes which expose the highest-level interface possible.
A car is a good example. It exposes a well-defined, standardised high-level interface. I don't concern myself with setSpeed(60)... is that MPH or km/h? I just accelerate, cruise, decelerate. I don't have to think about the details in setSteeringWheelAngle(getSteeringWheelAngle()+Math.rad(-1.5)), I just turn(-1.5), and the details are taken care of under the hood.
It boils down to "You can and should figure out what every class will be used for, what it does, what it represents, and expose the highest level interface possible which fulfills those requirements. Getters and setters are usually a cop-out, when the programmer is just to lazy to do the analysis required to determine exactly what each class is and is-not, and so we go down the path of "it can do anything". Getters and setters are evil!
Sometimes the actual requirements for a class are unknowable ahead of time. That's cool, just cop-out and use getter/setter antipattern for now, but when you do know, through experience, what the class is being used for, you'll probably want to comeback and cleanup the dirty low level interface. Refactoring based on "stuff you wish you knew when you write the sucker in the first place" is par for the course. You don't have to know everything in order to make a start, it's just that the more you do know, the less rework is likely to be required upon the way.
That's the mentality he's promoting. Getters and setters are an easy trap to fall into.
Yes, beans basically require getters and setters, but to me a bean is a special case. Beans represent nouns, things, tangible identifiable (if not physical) objects. Not a lot of objects actually have automatic behaviours; most times things are manipulated by external forces, including humans, to make them productive things.
daisy.setColor(Color.PINK) makes perfect sense. What else can you do? Maybe a Vulcan mind-meld, to make the flower want to be pink? Hmmm?
Getters and setters have their ?evil? place. It's just, like all really good OO things, we tend to overuse them, because they are safe and familiar, not to mention simple, and therefore it might be better if noobs didn't see or hear about them, at least until they'd mastered the mind-meld thing.
I think what Allen Holub tried to say, rephrased in this article, is the following.
Getters and setters can be useful for variables that you specifically want to encapsulate, but you don't have to use them for all variables. In fact, using them for all variables is nasty code smell.
The trouble programmers have, and Allen Holub was right in pointing it out, is that they sometimes use getters/setters for all variables. And the purpose of encapsulation is lost.
(note I'm coming at this from a .NET "property" angle)
Well, simply - I don't agree with him; he makes a big fuss about the return type of properties being a bad thing because it can break your calling code - but exactly the same argument would apply to method arguments. And if you can't use methods either?
OK, method arguments could be changed as widening conversions, but.... just why... Also, note that in C# the var keyword could mitigate a lot of this perceived pain.
Accessors are not an implementation detail; they are the public API / contract. Yup, if you break the contracft you have trouble. When did that become a surprise? Likewise, it is not uncommon for accessors to be non-trivial - i.e. they do more than just wrap fields; they perform calculations, logic checks, notifications, etc. And they allow interface based abstractions of state. Oh, and polymorphism - etc.
Re the verbose nature of accessors (p3?4?) - in C#: public int Foo {get; private set;} - job done.
Ultimately, all of code is a means to express our intent to the compiler. Properties let me do that in a type-safe, contract-based, verifiable, extensible, polymorphic way - thanks. Why do I need to "fix" this?
Getters and setters are used as little more than a mask to make a private variable public.
There's no point repeating what Holub said already but the crux of it is that classes should represent behaviour and not just state.
Some opposing views are in italics:
Though getIdentity starts with "get," it's not an accessor because it doesn't just return a field. It returns a complex object that has reasonable behavior
Oh but wait... then it's okay to use accessors as long as you return objects instead of primitive types? Now that's a different story, but it's just as dumb to me. Sometimes you need an object, sometimes you need a primitive type.
Also, I notice that Allen has radically softened his position since his previous column on the same topic, where the mantra "Never use accessors" didn't suffer one single exception. Maybe he realized after a few year that accessors do serve a purpose after all...
Bear in mind that I haven't actually put any UI code into the business logic. I've written the UI layer in terms of AWT (Abstract Window Toolkit) or Swing, which are both abstraction layers.
Good one. What if you are writing your application on SWT? How "abstract" is really AWT in that case? Just face it: this advice simply leads you to write UI code in your business logic. What a great principle. After all, it's only been like at least ten years since we've identified this practice as one of the worst design decisions you can make in a project.
My problem is as a novice programmer is sometimes stumbling onto articles on the internet and give them more credence then I should. Perhaps this is one of those cases.
When ideas like these are presented to me, I like to take a look at libraries and frameworks I use and which I like using.
For example, although some will disagree, I like the Java Standard API. I also like the Spring Framework. Looking at the classes in these libraries, you will notice that very rarely there are setters and getters which are there just to expose some internal variable. There are methods named getX, but that does not mean it is a getter in the conventional sense.
So, I think he has a point, and it is this: every time you press choose "Generate getters/setters" in Eclipse (or your IDE of choice), you should take a step back and wonder what you are doing. Is it really appropriate to expose this internal representation, or did I mess up my design at some step?
I don't believe he's saying never use get/set, but rather that using get/set for a field is no better than just making the field public (e.g. public string Name vs. public string Name {get; set; }).
If get/set is used it limits the information hiding of OO which can potentially lock you into a bad interface.
In the above example, Name is a string; what if we want to change the design later to add multiple Names? The interface exposed only a single string so we can’t add more without breaking existing implementation.
However, if instead of using get/set you initially had a method such as Add(string name), internally you could process name singularly or add to a list or what not and externally call the Add method as many times as you want to add more Names.
The OO goal is to design with a level of abstraction; don’t expose more detail than you absolutely have to.
Chances are if you’ve just wrapped a primitive type with a get/set you’ve broken this tenet.
Of course, this is if you believe in the OO goals; I find that most don't, not really, they just use Objects as a convienient way to group functional code.
Public variables make sense when the class is nothing more than a bundle of data with no real coherency, or when it's really, really elementary (such as a point class). In general, if there's any variable in a class that you think probably shouldn't be public, that means that the class has some coherence, and variables have a certain relation that should be maintained, so all variables should be private.
Getters and setters make sense when they reflect some sort of coherent idea. In a polygon class, for example, the x and y coordinates of given vertices have a meaning outside the class boundary. It probably makes sense to have getters, and it likely makes sense to have setters. In a bank account class, the balance is probably stored as a private variable, and almost certainly should have a getter. If it has a setter, it needs to have logging built in to preserve auditability.
There are some advantages of getters and setters over public variables. They provide some separation of interface and implementation. Just because a point has a .getX() function doesn't mean there has to be an x, since .getX() and .setX() can be made to work just fine with radial coordinates. Another is that it's possible to maintain class invariants, by doing whatever's necessary to keep the class consistent within the setter. Another is that it's possible to have functionality that triggers on a set, like the logging for the bank account balance.
However, for more abstract classes, the member variables lose individual significance, and only make sense in context. You don't need to know all the internal variables of a C++ stream class, for example. You need to know how to get elements in and out, and how to perform various other actions. If you counted on the exact internal structure, you'd be bogged down in detail that could arbitrarily vary between compilers or versions.
So, I'd say to use private variables almost exclusively, getters and setters where they have a real meaning in object behavior, and not otherwise.
Just because getters and setters are frequently overused doesn't mean they're useless.
The trouble with getters/setters is they try to fake encapsulation but they actually break it by exposing their internals. Secondly they are trying to do two separate things - providing access to and controlling their state - and end up doing neither very well.
It breaks encapsulation because when you call a get/set method you first need to know the name (or have a good idea) of the field you want to change, and second you have to know it's type eg. you couldn't call
setPositionX("some string");
If you know the name and type of the field, and the setter is public, then anyone can call the method as if it were a public field anyway, it's just a more complicated way of doing it, so why not just simplify it and make it a public field in the first place.
By allowing access to it's state but trying to control it at the same time, a get/set method just confuses things and ends up either being useless boiler-plate, or misleading, by not actually doing what it says it does by having side-effects the user might not expect. If error checking is needed, it could be called something like
public void tryPositionX(int x) throws InvalidParameterException{
if (x >= 0)
this.x = x;
else
throw new InvalidParameterException("Holy Negatives Batman!");
}
or if additional code is needed it could be called a more accurate name based on what the whole method does eg.
tryPositionXAndLog(int x) throws InvalidParameterException{
tryPositionX(x);
numChanges++;
}
IMHO needing getters/setters to make something work is often a symptom of bad design.
Make use of the "tell, don't ask" principle, or re-think why an object needs to send it's state data in the first place. Expose methods that change an object's behaviour instead of it's state. Benefits of that include easier maintenance and increased extensibility.
You mention MVC too and say a model can't be responsible for it's view, for that case Allen Holub gives an example of making an abstraction layer by having a "give-me-a-JComponent-that-represents-your-identity class" which he says would "isolate the way identities are represented from the rest of the system." I'm not experienced enough to comment on whether that would work or not but on the surface it sounds a decent idea.
Public getters/setters are bad if they provide access to implementation details. Yet, it is reasonable to provide access to object's properties and use getters/setters for this. For example, if Car has the color property, it's acceptable to let clients "observe" it using a getter. If some client needs the ability to recolor a car, the class can provide a setter ('recolor' is more clear name though). It is important to do not let clients know how properties are stored in objects, how they are maintained, and so on.
Ummmm...has he never heard of the concept of Encapsulation. Getter and Setter methods are put in place to control access to a Class' members. By making all fields publicly visible...anybody could write whatever values they wanted to them thereby completely invalidating the entire object.
Just in case anybody is a little fuzzy on the concept of Encapsulation, read up on it here:
Encapsulation (Computer Science)
...and if they're really evil, would .NET build the Property concept into the language? (Getter and Setter methods that just look a little prettier)
EDIT
The article does mention Encapsulation:
"Getters and setters can be useful for variables that you specifically want to encapsulate, but you don't have to use them for all variables. In fact, using them for all variables is nasty code smell."
Using this method will lead to extremely hard to maintain code in the long run. If you find out half way through a project that spans years that a field needs to be Encapsulated, you're going to have to update EVERY REFERENCE of that field everywhere in your software to get the benefit. Sounds a lot smarter to use proper Encapsulation up front and safe yourself the headache later.
I think that getters and setters should only be used for variables which one needs to access or change outside a class. That being said, I don't believe variables should be public unless they're static. This is because making variables public which aren't static can lead to them being changed undesirably. Let's say you have a developer who is carelessly using public variables. He then accesses a variable from another class and without meaning to, changes it. Now he has an error in his software as a result of this mishap. That's why I believe in the proper use of getters and setters, but you don't need them for every private or protected variable.

Should I use an interface like IEnumerable, or a concrete class like List<>

I recently expressed my view about this elsewhere* , but I think it deserves further analysis so I'm posting this as its own question.
Let's say that I need to create and pass around a container in my program. I probably don't have a strong opinion about one kind of container versus another, at least at this stage, but I do pick one; for sake of argument, let's say I'm going to use a List<>.
The question is: Is it better to write my methods to accept and return a high level interface such as C#'s IEnumerable? Or should I write methods to take and pass the specific container class that I have chosen.
What factors and criteria should I look for to decide? What kind of programs work benefit from one or the other? Does the computer language affect your decision? Performance? Program size? Personal style?
(Does it even matter?)
**(Homework: find it. But please post your answer here before you look for my own, so as not bias you.)*
Your method should always accept the least-specific type it needs to execute its function. If your method needs to enumerate, accept IEnumerable. If it needs to do IList<>-specific things, by definition you must give it a IList<>.
The only thing that should affect your decision is how you plan to use the parameter. If you're only iterating over it, use IEnumerable<T>. If you are accessing indexed members (eg var x = list[3]) or modifying the list in any way (eg list.Add(x)) then use ICollection<T> or IList<T>.
There is always a tradeoff. The general rule of thumb is to declare things as high up the hierarchy as possible. So if all you need is access to the methods in IEnumerable then that is what you should use.
Another recent example of a SO question was a C API that took a filename instead of a File * (or file descriptor). There the filename severly limited what sores of things could be passed in (there are many things you can pass in with a file descriptor, but only one that has a filename).
Once you have to start casting you have either gone too high OR you should be making a second method that takes a more specific type.
The only exception to this that I can think of is when speed is an absolute must and you do not want to go through the expense of a virtual method call. Declaring the specific type removes the overhead of virtual functions (will depend on the language/environment/implementation, but as a general statement that is likely correct).
It was a discussion with me that prompted this question, so Euro Micelli already knows my answer, but here it is! :)
I think Linq to Objects already provides a great answer to this question. By using the simplest interface to a sequence of items it could, it gives maximum flexibility about how you implement that sequence, which allows lazy generation, boosting productivity without sacrificing performance (not in any real sense).
It is true that premature abstraction can have a cost - but mainly it is the cost of discovering/inventing new abstractions. But if you already have perfectly good ones provided to you, then you'd be crazy not to take advantage of them, and that is what the generic collection interfaces provides you with.
There are those who will tell you that it is "easier" to make all the data in a class public, just in case you will need to access it. In the same way, Euro advised that it would be better to use a rich interface to a container such as IList<T> (or even the concrete class List<T>) and then clean up the mess later.
But I think, just as it is better to hide the data members of a class that you don't want to access, to allow you to modify the implementation of that class easily later, so you should use the simplest interface available to refer to a sequence of items. It is easier in practice to start by exposing something simple and basic and then "loosen" it later, than it is to start with something loose and struggle to impose order on it.
So assume IEnumerable<T> will do to represent a sequence. Then in those cases where you need to Add or Remove items (but still don't need by-index lookup), use IContainer<T>, which inherits IEnumerable<T> and so will be perfectly interoperable with your other code.
This way it will be perfectly clear (just from local examination of some code) precisely what that code will be able to do with the data.
Small programs require less abstraction, it is true. But if they are successful, they tend to become big programs. This is much easier if they employ simple abstractions in the first place.
It does matter, but the correct solution completely depends on usage. If you only need to do a simple enumeration then sure use IEnumerable that way you can pass any implementer to access the functionality you need. However if you need list functionality and you don't want to have to create a new instance of a list if by chance every time the method is called the enumerable that was passed wasn't a list then go with a list.
I answered a similar C# question here. I think you should always provide the simplest contract you can, which in the case of collections in my opinion, ordinarily is IEnumerable Of T.
The implementation can be provided by an internal BCL type - be it Set, Collection, List etcetera - whose required members are exposed by your type.
Your abstract type can always inherit simple BCL types, which are implemented by your concrete types. This in my opinion allows you to adhere to LSP easier.

Class member order in source code

This has been asked before (question no. 308581), but that particular question and the answers are a bit C++ specific and a lot of things there are not really relevant in languages like Java or C#.
The thing is, that even after refactorization, I find that there is a bit of mess in my source code files. I mean, the function bodies are alright, but I'm not quite happy with the way the functions themselves are ordered. Of course, in an IDE like Visual Studio it is relatively easy to find a member if you remember how it is called, but this is not always the case.
I've tried a couple of approaches like putting public methods first but that the drawback of this approach is that a function at the top of the file ends up calling an other private function at the bottom of the file so I end up scrolling all the time.
Another approach is to try to group related methods together (maybe into regions) but obviously this has its limits as if there are many non-related methods in the same class then maybe it's time to break up the class to two or more smaller classes.
So consider this: your code has been refactored properly so that it satisfies all the requirements mentioned in Code Complete, but you would still like to reorder your methods for ergonomic purposes. What's your approach?
(Actually, while not exactly a technical problem, this is problem really annoys the hell out of me so I would be really grateful if someone could come up with a good approach)
Actually I totally rely on the navigation functionality of my IDE, i.e. Visual Studio. Most of the time I use F12 to jump to the declaration (or Shift-F12 to find all references) and the Ctrl+- to jump back.
The reason for that is that most of the time I am working on code that I haven't written myself and I don't want to spend my time re-ordering methods and fields.
P.S.: And I also use RockScroll, a VS add-in which makes navigating and scrolling large files quite easy
If you're really having problems scrolling and finding, it's possible you're suffering from god class syndrome.
Fwiw, I personally tend to go with:
class
{
#statics (if any)
#constructor
#destructor (if any)
#member variables
#properties (if any)
#public methods (overrides, etc, first then extensions)
#private (aka helper) methods (if any)
}
And I have no aversion to region blocks, nor comments, so make free use of both to denote relationships.
From my (Java) point of view I would say constructors, public methods, private methods, in that order. I always try to group methods implementing a certain interface together.
My favorite weapon of choice is IntelliJ IDEA, which has some nice possibilities to fold methods bodies so it is quite easy to display two methods directly above each other even when their actual position in the source file is 700 lines apart.
I would be careful with monkeying around with the position of methods in the actual source. Your IDE should give you the ability to view the source in the way you want. This is especially relevant when working on a project where developers can use their IDE of choice.
My order, here it comes.
I usually put statics first.
Next come member variables and properties, a property that accesses one specific member is grouped together with this member. I try to group related information together, for example all strings that contain path information.
Third is the constructor (or constructors if you have several).
After that follow the methods. Those are ordered by whatever appears logical for that specific class. I often group methods by their access level: private, protected, public. But I recently had a class that needed to override a lot of methods from its base class. Since I was doing a lot of work there, I put them together in one group, regardless of their access level.
My recommendation: Order your classes so that it helps your workflow. Do not simply order them, just to have order. The time spent on ordering should be an investment that helps you save more time that you would otherwise need to scroll up and down.
In C# I use #region to seperate those groups from each other, but that is a matter of taste. There are a lot of people who don't like regions. I do.
I place the most recent method I just created on top of the class. That way when I open the project, I'm back at the last method I'm developing. Easier for me to get back "in the zone."
It also reflected the fact that the method(which uses other methods) I just created is the topmost layer of other methods.
Group related functions together, don't be hard-pressed to put all private functions at the bottom. Likewise, imitate the design rationale of C#'s properties, related functions should be in close proximity to each other, the C# language construct for properties reinforces that idea.
P.S.
If only C# can nest functions like Pascal or Delphi. Maybe Anders Hejlsberg can put it in C#, he also invented Turbo Pascal and Delphi :-) D language has nested functions.
A few years ago I spent far too much time pondering this question, and came up with a horrendously complex system for ordering the declarations within a class. The order would depend on the access specifier, whether a method or field was static, transient, volatile etc.
It wasn't worth it. IMHO you get no real benefit from such a complex arrangement.
What I do nowadays is much simpler:
Constructors (default constructor first, otherwise order doesn't matter.)
Methods, sorted by name (static vs. non-static doesn't matter, nor abstract vs. concrete, virtual vs. final etc.)
Inner classes, sorted by name (interface vs. class etc. doesn't matter)
Fields, sorted by name (static vs. non-static doesn't matter.) Optionally constants (public static final) first, but this is not essential.
i pretty sure there was a visual studio addin that could re-order the class members in the code.
so i.e. ctors on the top of the class then static methods then instance methods...
something like that
unfortunately i can't remember the name of this addin! i also think that this addin was for free!
maybe someone other can help us out?
My personal take for structuring a class is as follows:
I'm strict with
constants and static fields first, in alpha order
non-private inner classes and enums in alpha order
fields (and attributes where applicable), in alpha order
ctors (and dtors where applicable)
static methods and factory methods
methods below, in alpha order, regardless of visibility.
I use the auto-formatting capabilities of an IDE at all times. So I'm constantly hitting Ctrl+Shift+F when I'm working. I export auto-formatting capabilities in an xml file which I carry with me everywhere.
It helps down the lane when doing merges and rebases. And it is the type of thing you can automate in your IDE or build process so that you don't have to make a brain cell sweat for it.
I'm not claiming MY WAY is the way. But pick something, configure it, use it consistently until it becomes a reflex, and thus forget about it.