Object Normalization - language-agnostic

In the same line as Database Normalization - is there an approach to object normalization, not design pattern, but the same mathematical like approach to normalizing object creation. For example: first normal form: no repeating fields....
here's some links to DB Normalization:
http://en.wikipedia.org/wiki/Database_normalization
http://databases.about.com/od/specificproducts/a/normalization.htm
Would this make object creation and self-documentation better?
Here's a link to a book about class normalization (guess we're really talking about classes)
http://www.agiledata.org/essays/classNormalization.html

Normalization has a mathematical foundation in predicate logic, and a clear and specific goal that the same piece of information never be represented twice in a single model; the purpose of this goal is to eliminate the possibility of inconsistent information in a data model. It can be shown via mathematical proof that if a data model has certain specific properties (that it passes tests for 1st Normal Form (1NF), 2NF, 3NF, etc.) that it is free from redundant data representation, i.e. it is Normalized.
Object orientation has no such underlying mathematical basis, and indeed, no clear and specific goal. It is simply a design idea for introducing more abstraction. The DRY principle, Command-Query Separation, Liskov Substitution Principle, Open-Closed Principle, Tell-Don't-Ask, Dependency Inversion Principle, and other heuristics for improving quality of code (many of which apply to code in general, not just object oriented programs) are not absolute in nature; they are guidelines that programmers have found useful in improving understandability, maintainability, and testability of their code.
With a relational data model, you can say with absolute certainty whether it is "normalized" or not, because it must pass ALL the tests for normal form, and they are quite specific. With an object model, on the other hand, because the goal of "understandable, maintainable, testable, etc" is rather vague, you cannot say with any certainty whether you have met that goal. With many of the design heuristics, you cannot even say for sure whether you have followed them. Have you followed the DRY principle if you're applying patterns to your design? Surely repeated use of a pattern isn't DRY? Furthermore, some of these heuristics or principles aren't always even necessarily good advice all the time. I do try to follow Command-Query Separation, but such useful things as a Stack or a Queue violate that concept in order to give us a rather elegant and useful result.

I guess the Single Responsible Principle is at least related to this. Or at least, violation of the SRP is similar to a lack of normalization in some ways.
(It's possible I'm talking rubbish. I'm pretty tired.)

Interesting.
You may also be interested in looking at the Law of Demeter.
Another thing you may be interested in is c2's FearOfAddingClasses, as, arguably, the same reasoning that lead programmers to denormalise databases also leads to god classes and other code smells. For both OO and DB normalisation, we want to decompose everything. For databases this means more tables, for OO, more classes.
Now, it is worth bearing in mind the object relational impedance mismatch, that is, probably not everything will translate cleanly.
Object relational models or 'persistence layers', usually have 1-to-1 mappings between object attributes and database fields. So, can we normalise? Say we have department object with employee1, employee2 ... etc. attributes. Obviously that should be replaced with a list of employees. So we can say 1NF works.
With that in mind, let's go straight for the kill and look at 6NF database design, a good example is Anchor Modeling, (ignore the naming convention). Anchor Modeling/6NF provides highly decomposed and flexible database schemas; how does this translate to OO 'normalisation'?
Anchor Modeling has these kinds of relationships:
Anchors - unique object IDs.
Attributes, which translate to object attributes: (Anchor, value, metadata).
Ties - relationships between two or more objects (themselves anchors): (Anchor, Anchor... , metadata)
Knots, attributed Ties.
Attribute metadata can be anything - who changed an attribute, when, why, etc.
The OO translation of this is looks extremely flexible:
Anchors suggest attribute-less placeholders, like a proxy which knows how to deal with the attribute composition.
Attributes suggest classes representing attributes and what they belong to. This suggests applying reuse to how attributes are looked up and dealt with, e.g automatic constraint checking, etc. From this we have a basis to generically implement the GOF-style Structural patterns.
Ties and Knots suggest classes representing relationships between objects. A basis for generic implementation of the Behavioural design patterns?
Interesting and desirable properties of Anchor Modeling that also translate across are:
All this requires replacing inheritance with composition (good) in the exposed objects.
Attribute have owners rather than owners having attributes. Although this make attribute lookup more complex, it neatly solves certain aliasing problems, as there can only ever be one owner.
No need for NULL. This translates to clearer NULL handling. Empty-case attribute classes could provide methods for handling the lack of a particular attribute, instead of performing NULL-checking everywhere.
Attribute metadata. Attribute-level full historisation and blaming: 'play' objects back in time, see what changed, when and why, etc. (if required - metadata is entirely optional)
There would probably be a lot of very simple classes (which is good), and a very declarative programming style (also good).
Thanks for such a thought provoking question, I hope this is useful for you.

Perhaps you're taking this from a relational point-of-view, but I would posit that the principles of interfaces and inheritance correspond to normalization in the world of OOP.
For example, a Person abstract class containing FirstName, LastName, Gender and BirthDate can be used by classes such as Employee, User, Member etc. as a valid base class, without a need to repeat the definitions of those attributes in such subclasses.
The principle of DRY, (a core principle of Andy Hunt and Dave Thomas's book The Pragmatic Programmer), and the constant emphasis of object-oriented programming on re-use, also correspond to the efficiencies offered by Normalization in relational databases.

At first glance, I'd say that the objectives of Code Refactoring are similar in an abstract way to the objectives of normalization. But that's pretty abstract.
Update: I almost wrote earlier that "we need to get Jon Skeet in on this one." I posted my answer and who beat me? You guessed it...

Object Role Modeling (not to be confused with Object Relational Mapping) is the closest thing I know of to normalization for objects. It doesn't have as mathematical a foundation as normalization, but it's a start.

In a fairly ad-hoc and untutored fashion, that will probably cause purists to scoff, and perhaps rightly, I think of a database table as being a set of objects of a particular type, and vice versa. Then I take my thoughts from there. Viewed this way, it doesn't seem to me like there's anything particularly special you have to do to use normal form in your everyday programming. Each object's identity will do for starters as its primary key, and references (pointers, etc.) will do by way of foreign keys. Then just follow the same rules.
(My objects usually end up in 3NF, or some approximation thereof. I treat this all more as guidelines, and, like I said, "untutored".)
If the rules are followed properly, each bit of information then ends up in one place, the interrelationships are clear, and everything is structured such that introducing inconsistencies takes some work. One could say that this approach produces good results on this basis, and I would agree.
One downside is that the end result can feel a bit like a tangle of spaghetti, particularly after some time away, and it's hard to shake the constant lingering sensation, even though it's usually false, that surely a few of all these links could be removed...

Object oriented design is rational but it does not have the same mathematically well-defined basis as the Relational Model. There is nothing exactly equivalent to the well-defined normal forms of database design.
Whether this is a strength or a weakness of Object oriented design is a matter of interpretation.

I second the SRP. The Open Closed Principle applies as well to "normalization" although I might stretch the meaning of the word, in that it should be possible to extend the system by adding new implementations, without modifying the existing code. objectmentor about OCP

good question, sorry i can't answer in depth
I've been working on object normalization off and on for over 20 years. It's deep and complicated and beautiful, and is the subject of my second planned book, Object Mechanics II. ONF = Object Normal Form, you heard it here first! ;-)
since potentially patentable technology lurks within, I am not at liberty to say more, except that normalizing the data is the really easy part ;-)
ADDENDUM: change of plans - see https://softwareengineering.stackexchange.com/questions/84598/object-oriented-normalization

Related

Has the word Abstraction other interpretations in computing?

I am quite new in programming and what haunts me about it is not really the coding (well at least not until the present moment!) itself, but some words/concepts that are really important to understand. My doubt is with the word "ABSTRACTION". I have already searched dictionaries and saw some videos of people giving very clear explanations of the word. So, I know that abstraction is when you take into consideration only the things that are important and leave out everything else (putting in very simple and direct language), like for instance, if you are going to change a light bulb, you do not need to know the manufacturer of the light bulb or the light socket. You also do not need to know the materials used to manufacture the light bulb. However, the problem is when you read some texts or listen to people using the word and it does not seem to fit the meaning and then you start to wonder if they misused the word (which I think is very unlikely) or it is because there is another obscure meaning that I have not found yet or maybe it is just because I am too dumb to understand it. Below I put excerpts from articles I was reading and bolded and capitalized the part where the word appears so you guys have a context and understand where my problem is. Thank you.
"A paradigm programming provides and determines the view that the programmer has on the structuring and execution of the programme. For example, in object-oriented programming, programmers MAY ABSTRACT A PROGRAMME AS A COLLECTION OF OBJECTS that interact with each other, while in functional programming, programmers ABSTRACT THE PROGRAMME as a sequence of functions executed in a stacked fashion."
"A tuple space has the function of creating a SHARED MEMORY ABSTRACTION over a distributed system, where everyone can read and write to it."
It's easy to understand if you replace abstract/abstraction with one of its synonyms conceptualize/conceptualization. In your first two examples "abstract a programme" means "think of a programme as"... or "conceptualize a programme as"... When we make an abstraction we forget about some details, and think about that thing in other terms.
Side advice from a fellow beginner:
As someone who started learning computer science independently less than a year ago, I can tell you right now there will be lots of tricky terms like this. Try not to get too caught up in them. Often times if you just keep learning, you'll experience first hand what these terms mean without even realizing it. Bits and pieces will add up. The takeaway from this being, don't let what you don't know slow you down. Sometimes it's ok to keep going and just not know for a while.
These seem to fit the definition you put up earlier. For object oriented programming, the mindset is to consider "objects" as the essential (important) aspect of a program and abstract all other considerations away. Same thing for functional programming where "functions" are the defining aspect abstracting other considerations as secondary.
The tuple space may be a little trickier but if you consider that variations in memory storage models are abstracted away in favour of a higher level concept focusing on a collection of values, then you see what the abstraction relates to.
Abstract
adjective
existing in thought or as an idea but not having a physical or concrete existence.
relating to or denoting art that does not attempt to represent external reality, but rather seeks to achieve its effect using shapes, colours, and textures.
verb
consider something theoretically or separately from (something else).
extract or remove (something).
noun
a summary of the contents of a book, article, or speech.
an abstract work of art.
There you have your answer. Ask 100 people what an abstract painting is, you will get at least 100 answers. Why should programmers behave differently?
Lets see what Oracle has to say about abstract classes:
Abstract classes are similar to interfaces. You cannot instantiate them, and they may contain a mix of methods declared with or without an implementation. However, with abstract classes, you can declare fields that are not static and final, and define public, protected, and private concrete methods.
Consider using abstract classes if any of these statements apply to your situation:
You want to share code among several closely related classes.
You expect that classes that extend your abstract class have many common methods or fields, or require access modifiers other than public (such as protected and private).
You want to declare non-static or non-final fields. This enables you to define methods that can access and modify the state of the object to which they belong.
Compare that with the definition of abstract in the above section. I think you get a pretty good idea of abstractness in computer programming.

How to identify what is not a Class?

I know the rule of thumb is that a noun used by the user is potentially a class. Similarly, a verb may be made into an action class e.g. predicate
Given a description from the user, how do you -
identify what is not not to be made into a class
The only real answer is experience. However, some things fairly obviously (to me, anyway) cannot be modelled in your design. For example if the use case says:
"and then the parcel is put on the UPS van"
There is no need to model the van. You can make decisions of this kind by considering the system boundaries - you don't and can't control the van. However,
"we make a request to UPS for pickup"
might well result in a UPSPickup object.
The rules are simple.
Everything is an object.
All objects belong to classes.
In rare (very rare circumstances) you have some specialized class/object confusion.
A "library" of all static methods. This is an implementation choice, and no user can see this.
A Singleton where there can only be one object of the given class. This does happen sometimes.
In an OO language it is not a question of what to be made into a class, but rather 'what class does this data/functionality go into?'
Like other software architecture aspects there are rules, but ultimately it is an art that requires experience. There are lots of books on software design, but a simple reference is Coupling and Cohesion.
Cohesion of a single module/component is the degree to which its responsibilities form a meaningful unit; higher cohesion is better.
Coupling between modules/components is their degree of mutual interdependence; lower coupling is better.

Creative Terminology

I seem to use bland words such as node, property, children (etc) too often, and I fear that someone else would have difficulty understanding my code simply because the parts' names are vague, common words.
How do you find creative names for classes and components to make them more memorable?
I am particularly having trouble with generic tools which have no real description except their rather generic functional purpose. I would like to know if others have found creative ways to name things rather than simply naming them by their utility, such as AnonymousFunctionWrapperCallerExecutorFactory.
It's hard to answer. I find them just because they seem to 'fit'.
What I do know, however, is that I find it basically impossible to move on writing code unless something is named correctly, and it 'feels' good. If it isn't named right, I find it hard to use, and the code is generally confusing.
I'm not too concerned about something being 'memorable', only 'accurate'.
I have been known to sit around thinking out loud about what to name something. Take your time, and make sure you are really happy with the name. don't be afraid of using common/simple words.
I don't really have an answer, but three things for you to think about.
The late Phil Karlton famously said: "There are only two hard problems in computer science. Cache Invalidation and Naming Things." So, the fact that you are having trouble coming up with good names is entirely normal and even expected.
OTOH, having trouble naming things can also be a sign of bad design. (And yes, I am perfectly aware, that #1 and #2 contradict each other. Or maybe one should think of it more like balancing each other.) E.g., if a thing has too many responsibilities, it is pretty much impossible to come up with a good name. (Witness all the "Service", "Util", "Model" and "Manager" classes in bad OO designs. Here's an example Google Code Search for "ManagerFactoryFactory".)
Also, your names should map to the domain jargon used by subject matter experts. If you can't find a subject matter expert, that's a sign that you are currently worrying about code that you're not supposed to worry about. (Basically, code that implements your core business domain should be implemented and designed well, code in ancillary domains should be implemented and designed so-so, and all other code should not be implemented or designed at all, but bought from a vendor, where what you are buying is their core business domain. [Please interpret "buy" and "vendor" liberally. Community-developed Free Software is just fine.])
Regarding #3 above, you mentioned in another comment that you are currently working on implementing a tree data structure. Unless your company is in the business of selling tree data structures, that is not a part of your core domain. And the reason that you have trouble finding good names could be that you are working outside your core domain. Now, "selling tree data structures" may sound stupid, but there are actually companies that do that. For example, the BCL team inside Microsoft's developer division: they actually sell (well, for certain definitions of "sell", anyway) the .NET framework's Base Class Libraries, which include, among others, tree data structures. But note that for example Microsoft's C++ compiler team actually (literally) buys their STL from a third-party vendor – they figure that their core domain is writing compilers, and they leave the writing of libraries to a company who considers writing STLs their core domain. (And indeed, AFAIK, that company does nothing but write and sell STL implementations. That's their sole product.)
If, however, selling tree data structures is your core domain, then the names you listed are just fine. They are the names that subject matter experts (programmers, in this case) use when talking about the domain of tree data structures.
Using 'metaphors' is a common theme in agile (and pattern) literature.
'Children' (in your question) is an example of a metaphor that is extensively used and for good reasons.
So, I'd encourage the use of metaphors, provided they are applicable and not a stretch of the imagination.
Metaphors are everywhere in computing. From files to bugs to pointers to streams... you can't avoid them.
I believe that for the purpose of standardization and communication, it's good to use a common vocab, like in the same case for design patterns. I have a problem with a programmer who keeps 'inventing' his own terms and I have trouble understanding him. (He kept using the term 'events orchestrating' instead of 'scripting' or 'FCFS process'. Kudos for creativity though!)
Those common vocab describe stuff we are used to. A node is a point, somewhere in a graph, in a tree, or what-not. One way is to be specific to the domain. If we are doing a mapping problem, instead of 'node', we can use 'location'. That helps in a sense, at least for me. So I find there is a need to balance being able to communicate with other programmers, and at the same time keeping the descriptor specific enough to help me remember what it does.
I think node, children, and property are great names. I can already guess the following about your classes, just by their "bland" names:
Node - this class is part of a graph of objects
children - this variable holds a list of nodes belonging to the containing node.
I don't think "node" is either vague or common, and if you're coding a generic data structure, it's probably ok to have generic names! (With that being said, if you are coding up a tree, you could use something like TreeNode to emphasize that the node is part of a tree.) One way you can make the life of developers who will use your API easier is to follow the naming conventions of your platform's built in libraries. If everyone calls a node a node, and an iterator an iterator, it makes life easy.
Names that reflect the purpose of the class, method or property are more memorable than creative ones. Modern IDEs make it easier to use longer names so feel fee to be descriptive. Getting creative won't help as much as getting accurate.
I recommend to pick nouns from a specific application domain. E.g. if you are putting cars in a tree, call the node class Car - the fact that it is also a node should be apparent from the API. Also, don't try to be too generic in your implementation - don't put all attributes of the car into a hashtable named properties, but create separate attributes for make, color, etc.
A lot of languages and coding styles like to use all sorts of descriptive prefixes. In PHP there are no clear types, so this may help greatly. Instead of doing
$isAvailable = true;
try
$bool_isAvailable = true;
It is admittedly a pain, but usually well worth the time.
I also like to use long names to describe things. It may seem strange, but is usually easier to remember, especially when I go back to refactor my code
$leftNode->properties < $leftTreeNode->arrayOfNodeProperties;
And if all else fails. Why not fall back on a solid star wars themed program.
$luke->lightsaber($darth[$ewoks]);
And lastly, in college I named my classes after my professor, and then my class methods all the things I wanted to do to that jerk.
$Kube->canEat($myShorts, $withKetchup);

How often do you need to create a real class hierarchy in your day to day programming?

I create business applications with heavy database use. Most of the programming work is just to connect components to the database and modifying components to adapt to general interface behaviour. I mostly use Delphi with its rich VCL library, and generally buy components needed. I keep most of the business logic in the database. I rarely get the chance to build a nice class hierarchy from the bottom up as there really is no need. Anyone else have this experience?
For me, occasionally a problem is clearer or easier with subclassing, but not often.
This also changes quite a bit in a given design as it's refactored.
My biggest problem is that programming courses and texts give so much weight to inheritance, hierarchies, and polymorphism through base classes (vs. interfaces or dynamic typing). This helps create legions of programmers that subclass everything and their mother.
The answer to this question is not totally language-agnostic;
Some languages like Java have a fairly limited set of language features available, meaning that subclassing is fairly often used because it's a convenient method for re-use, technical inheritance.
Closures and lambdas of C# make inheritance for technical reasons much less relevant. So normally inheritance is used for semantic reasons (like cat extends animal).
The last C# project I worked on, we more or less made all of the class hierarchies within a few weeks. After that it was more or less over.
On my current java project we create new class hierarchies all of the time.
Other languages will have other features that similarly affect this composition (mixins come to mind)
I put on my architecting/class design hat probably once or twice a month. It's probably the best hat I have and is the most fun to wear.
Depends what stage of the lifecycle your project is in though.
When your tackling problem domains you are well familiar with and already have a common code base to work from, you often have no need to create a new class hierarchy. It's when you stumble upon problems you have no ready solutions for, that you start building your own.
It's also very dependant on the type of applications you develop. If your domain already has well accepted conventions and libraries to work from, there probably isn't any need to reinvent the wheel (other than personal / academic interests). Some areas have inherently less available resources to work with, and in those you'll find yourself building everything from scratch most of the time.
A majority of applications, especially business applications, contains at least some kind of business logic in it. I would contend that business should not be in the database, but should rather be in the application. You can put referential integrity in the database as I think this is a good choice, but business logic should be only in the application.
By class hierarchy, I suppose you mean do you always have to end up with some inheritance in your object model, then the answer is no. But chances are you can often find some common code, factor it out and create a base class to contain the common code.
If you agree with me on the point that business logic should not be in the database, but should be in the application, then I recommend you look into the MVC Design Pattern to guide your design. You will find your design contain classes or objects. Your VCLs will represent your View, and you can have your Model classes map directly to the database table, i.e. each member in the class in the model corresponds to a field in a database table (again, this is the norm but there will be exception, where this simplicity fails to apply). Then you'll need a layer to handle the CRUD (Create, Read, Update, Delete) of the Model classes to the database tables. You will end up with an "layered" application that is easier to maintain and enhance.
It depends on what you mean by hierarchy - inheritance or layering?
When object oriented languages first came out, inheritance was overused. Complicated hierarchies were common. Now, interfaces (as in Java and C#) provide a simpler way to get the benefit of polymorphism without the complications of inheritance. I rarely use inheritance anymore.
Layering, however, is vital when creating a large application. Layering prevents general low-level classes (like lists) from directly referencing specific high-level classes (like web browser windows). As far as I know, there isn't a formal way to describe layering, but there are general guidelines (model-view-controller (MVC), separate GUI logic from business logic, separate data from presentation, etc.).
It really depends on the types/phases of the projects you're working on. I happen to do that everyday because I'm working on database internals for a new database, creating related libraries/frameworks. I'd imagine doing that a lot less if I'm working within a mature framework using other people's libraries.
I'm doing Infrastructure for our companys' product, so I'm writing a lot of code that will be used later by guys in other teams. So I end up writing lots of abstract classes, interfaces, hierarchies and so on. Mostly it's just a pattern of "default behaviour in an abstract/virtual class, which other programmers may override".
Very challenging, I must say.
The time that I find class hierarchies most beneficial is when the relationship between objects actually does match a true "is-a" relationship in the domain.
However if I can avoid large hierarchies I will due to the fact that they are often a little more tricky to map to relational databases and can really complicate your database designs. Since you say most of your applications make heavy use of databases this would be something to take into consideration.

What's the point of OOP?

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
As far as I can tell, in spite of the countless millions or billions spent on OOP education, languages, and tools, OOP has not improved developer productivity or software reliability, nor has it reduced development costs. Few people use OOP in any rigorous sense (few people adhere to or understand principles such as LSP); there seems to be little uniformity or consistency to the approaches that people take to modelling problem domains. All too often, the class is used simply for its syntactic sugar; it puts the functions for a record type into their own little namespace.
I've written a large amount of code for a wide variety of applications. Although there have been places where true substitutable subtyping played a valuable role in the application, these have been pretty exceptional. In general, though much lip service is given to talk of "re-use" the reality is that unless a piece of code does exactly what you want it to do, there's very little cost-effective "re-use". It's extremely hard to design classes to be extensible in the right way, and so the cost of extension is normally so great that "re-use" simply isn't worthwhile.
In many regards, this doesn't surprise me. The real world isn't "OO", and the idea implicit in OO--that we can model things with some class taxonomy--seems to me very fundamentally flawed (I can sit on a table, a tree stump, a car bonnet, someone's lap--but not one of those is-a chair). Even if we move to more abstract domains, OO modelling is often difficult, counterintuitive, and ultimately unhelpful (consider the classic examples of circles/ellipses or squares/rectangles).
So what am I missing here? Where's the value of OOP, and why has all the time and money failed to make software any better?
The real world isn't "OO", and the idea implicit in OO--that we can model things with some class taxonomy--seems to me very fundamentally flawed
While this is true and has been observed by other people (take Stepanov, inventor of the STL), the rest is nonsense. OOP may be flawed and it certainly is no silver bullet but it makes large-scale applications much simpler because it's a great way to reduce dependencies. Of course, this is only true for “good” OOP design. Sloppy design won't give any advantage. But good, decoupled design can be modelled very well using OOP and not well using other techniques.
There are much better, more universal models (Haskell's type model comes to mind) but these are also often more complicated and/or difficult to implement efficiently. OOP is a good trade-off between extremes.
OOP isn't about creating re-usable classes, its about creating Usable classes.
All too often, the class is used
simply for its syntactic sugar; it
puts the functions for a record type
into their own little namespace.
Yes, I find this to be too prevalent as well. This is not Object Oriented Programming. It's Object Based Programming and data centric programing. In my 10 years of working with OO Languages, I see people mostly doing Object Based Programming. OBP breaks down very quickly IMHO since you are essentially getting the worst of both words: 1) Procedural programming without adhering to proven structured programming methodology and 2) OOP without adhering to to proven OOP methodology.
OOP done right is a beautiful thing. It makes very difficult problems easy to solve, and to the uninitiated (not trying to sound pompous there), it can almost seem like magic. That being said, OOP is just one tool in the toolbox of programming methodologies. It is not the be all end all methodology. It just happens to suit large business applications well.
Most developers who work in OOP languages are utilizing examples of OOP done right in the frameworks and types that they use day-to-day, but they just aren't aware of it. Here are some very simple examples: ADO.NET, Hibernate/NHibernate, Logging Frameworks, various language collection types, the ASP.NET stack, The JSP stack etc... These are all things that heavily rely on OOP in their codebases.
Reuse shouldn't be a goal of OOP - or any other paradigm for that matter.
Reuse is a side-effect of an good design and proper level of abstraction. Code achieves reuse by doing something useful, but not doing so much as to make it inflexible. It does not matter whether the code is OO or not - we reuse what works and is not trivial to do ourselves. That's pragmatism.
The thought of OO as a new way to get to reuse through inheritance is fundamentally flawed. As you note the LSP violations abound. Instead, OO is properly thought of as a method of managing the complexity of a problem domain. The goal is maintainability of a system over time. The primary tool for achieving this is the separation of public interface from a private implementation. This allows us to have rules like "This should only be modified using ..." enforced by the compiler, rather than code review.
Using this, I'm sure you will agree, allows us to create and maintain hugely complex systems. There is lots of value in that, and it is not easy to do in other paradigms.
Verging on religious but I would say that you're painting an overly grim picture of the state of modern OOP. I would argue that it actually has reduced costs, made large software projects manageable, and so forth. That doesn't mean it's solved the fundamental problem of software messiness, and it doesn't mean the average developer is an OOP expert. But the modularization of function into object-components has certainly reduced the amount of spaghetti code out there in the world.
I can think of dozens of libraries off the top of my head which are beautifully reusable and which have saved time and money that can never be calculated.
But to the extent that OOP has been a waste of time, I'd say it's because of lack of programmer training, compounded by the steep learning curve of learning a language specific OOP mapping. Some people "get" OOP and others never will.
There's no empirical evidence that suggests that object orientation is a more natural way for people to think about the world. There's some work in the field of psychology of programming that shows that OO is not somehow more fitting than other approaches.
Object-oriented representations do not appear to be universally more usable or less usable.
It is not enough to simply adopt OO methods and require developers to use such methods, because that might have a negative impact on developer productivity, as well as the quality of systems developed.
Which is from "On the Usability of OO Representations" from Communications of the ACM Oct. 2000. The articles mainly compares OO against theprocess-oriented approach. There's lots of study of how people who work with the OO method "think" (Int. J. of Human-Computer Studies 2001, issue 54, or Human-Computer Interaction 1995, vol. 10 has a whole theme on OO studies), and from what I read, there's nothing to indicate some kind of naturalness to the OO approach that makes it better suited than a more traditional procedural approach.
I think the use of opaque context objects (HANDLEs in Win32, FILE*s in C, to name two well-known examples--hell, HANDLEs live on the other side of the kernel-mode barrier, and it really doesn't get much more encapsulated than that) is found in procedural code too; I'm struggling to see how this is something particular to OOP.
HANDLEs (and the rest of the WinAPI) is OOP! C doesn't support OOP very well so there's no special syntax but that doesn't mean it doesn't use the same concepts. WinAPI is in every sense of the word an object-oriented framework.
See, this is the trouble with every single discussion involving OOP or alternative techniques: nobody is clear about the definition, everyone is talking about something else and thus no consensus can be reached. Seems like a waste of time to me.
Its a programming paradigm.. Designed to make it easier for us mere mortals to break down a problem into smaller, workable pieces..
If you dont find it useful.. Don't use it, don't pay for training and be happy.
I on the other hand do find it useful, so I will :)
Relative to straight procedural programming, the first fundamental tenet of OOP is the notion of information hiding and encapsulation. This idea leads to the notion of the class that seperates the interface from implementation. These are hugely important concepts and the basis for putting a framework in place to think about program design in a different way and better (I think) way. You can't really argue against those properties - there is no trade-off made and it is always a cleaner way to modulize things.
Other aspects of OOP including inheritance and polymorphism are important too, but as others have alluded to, those are commonly over used. ie: Sometimes people use inheritance and/or polymorphism because they can, not because they should have. They are powerful concepts and very useful, but need to be used wisely and are not automatic winning advantages of OOP.
Relative to re-use. I agree re-use is over sold for OOP. It is a possible side effect of well defined objects, typically of more primitive/generic classes and is a direct result of the encapsulation and information hiding concepts. It is potentially easier to be re-used because the interfaces of well defined classes are just simply clearer and somewhat self documenting.
The problem with OOP is that it was oversold.
As Alan Kay originally conceived it, it was a great alternative to the prior practice of having raw data and all-global routines.
Then some management-consultant types latched onto it and sold it as the messiah of software, and lemming-like, academia and industry tumbled along after it.
Now they are lemming-like tumbling after other good ideas being oversold, such as functional programming.
So what would I do differently? Plenty, and I wrote a book on this. (It's out of print - I don't get a cent, but you can still get copies.)Amazon
My constructive answer is to look at programming not as a way of modeling things in the real world, but as a way of encoding requirements.
That is very different, and is based on information theory (at a level that anyone can understand). It says that programming can be looked at as a process of defining languages, and skill in doing so is essential for good programming.
It elevates the concept of domain-specific-languages (DSLs). It agrees emphatically with DRY (don't repeat yourself). It gives a big thumbs-up to code generation. It results in software with massively less data structure than is typical for modern applications.
It seeks to re-invigorate the idea that the way forward lies in inventiveness, and that even well-accepted ideas should be questioned.
HANDLEs (and the rest of the WinAPI) is OOP!
Are they, though? They're not inheritable, they're certainly not substitutable, they lack well-defined classes... I think they fall a long way short of "OOP".
Have you ever created a window using WinAPI? Then you should know that you define a class (RegisterClass), create an instance of it (CreateWindow), call virtual methods (WndProc) and base-class methods (DefWindowProc) and so on. WinAPI even takes the nomenclature from SmallTalk OOP, calling the methods “messages” (Window Messages).
Handles may not be inheritable but then, there's final in Java. They don't lack a class, they are a placeholder for the class: That's what the word “handle” means. Looking at architectures like MFC or .NET WinForms it's immediately obvious that except for the syntax, nothing much is different from the WinAPI.
Yes OOP did not solve all our problems, sorry about that. We are, however working on SOA which will solve all those problems.
OOP lends itself well to programming internal computer structures like GUI "widgets", where for example SelectList and TextBox may be subtypes of Item, which has common methods such as "move" and "resize".
The trouble is, 90% of us work in the world of business where we are working with business concepts such as Invoice, Employee, Job, Order. These do not lend themselves so well to OOP because the "objects" are more nebulous, subject to change according to business re-engineering and so on.
The worst case is where OO is enthusiastically applied to databases, including the egregious OO "enhancements" to SQL databases - which are rightly ignored except by database noobs who assume they must be the right way to do things because they are newer.
In my experience of reviewing code and design of projects I have been through, the value of OOP is not fully realised because alot of developers have not properly conceptualised the object-oriented model in their minds. Thus they do not program with OO design, very often continuing to write top-down procedural code making the classes a pretty flat design. (if you can even call that "design" in the first place)
It is pretty scary to observe how little colleagues know about what an abstract class or interface are, let alone properly design an inheritance hierarchy to suit the business needs.
However, when good OO design is present, it is just sheer joy reading the code and seeing the code naturally fall into place into intuitive components/classes. I have always perceived system architecture and design like designing the various departments and staff jobs in a company - all are there to accomplish a certain piece of work in the grand scheme of things, emitting the synergy required to propel the organisation/system forward.
That, of course, is quite rare unfortunately. Like the ratio of beautifully-designed versus horrendously-designed physical objects in the world, the same can pretty much be said about software engineering and design. Having the good tools at one's disposal does not necessarily confer good practices and results.
Maybe a bonnet, lap or a tree is not a chair but they all are ISittable.
I think those real world things are objects
You do?
What methods does an invoice have? Oh, wait. It can't pay itself, it can't send itself, it can't compare itself with the items that the vendor actually delivered. It doesn't have any methods at all; it's totally inert and non-functional. It's a record type (a struct, if you prefer), not an object.
Likewise the other things you mention.
Just because something is real does not make it an object in the OO sense of the word. OO objects are a peculiar coupling of state and behaviour that can act of their own accord. That isn't something that's abundant in the real world.
I have been writing OO code for the last 9 years or so. Other than using messaging, it's hard for me to imagine other approach. The main benefit I see totally in line with what CodingTheWheel said: modularisation. OO naturally leads me to construct my applications from modular components that have clean interfaces and clear responsibilities (i.e. loosely coupled, highly cohesive code with a clear separation of concerns).
I think where OO breaks down is when people create deeply nested class heirarchies. This can lead to complexity. However, factoring out common finctionality into a base class, then reusing that in other descendant classes is a deeply elegant thing, IMHO!
In the first place, the observations are somewhat sloppy. I don't have any figures on software productivity, and have no good reason to believe it's not going up. Further, since there are many people who abuse OO, good use of OO would not necessarily cause a productivity improvement even if OO was the greatest thing since peanut butter. After all, an incompetent brain surgeon is likely to be worse than none at all, but a competent one can be invaluable.
That being said, OO is a different way of arranging things, attaching procedural code to data rather than having procedural code operate on data. This should be at least a small win by itself, since there are cases where the OO approach is more natural. There's nothing stopping anybody from writing a procedural API in C++, after all, and so the option of providing objects instead makes the language more versatile.
Further, there's something OO does very well: it allows old code to call new code automatically, with no changes. If I have code that manages things procedurally, and I add a new sort of thing that's similar but not identical to an earlier one, I have to change the procedural code. In an OO system, I inherit the functionality, change what I like, and the new code is automatically used due to polymorphism. This increases the locality of changes, and that is a Good Thing.
The downside is that good OO isn't free: it requires time and effort to learn it properly. Since it's a major buzzword, there's lots of people and products who do it badly, just for the sake of doing it. It's not easier to design a good class interface than a good procedural API, and there's all sorts of easy-to-make errors (like deep class hierarchies).
Think of it as a different sort of tool, not necessarily generally better. A hammer in addition to a screwdriver, say. Perhaps we will eventually get out of the practice of software engineering as knowing which wrench to use to hammer the screw in.
#Sean
However, factoring out common finctionality into a base class, then reusing that in other descendant classes is a deeply elegant thing, IMHO!
But "procedural" developers have been doing that for decades anyway. The syntax and terminology might differ, but the effect is identical. There is more to OOP than "reusing common functionality in a base class", and I might even go so far as to say that that is hard to describe as OOP at all; calling the same function from different bits of code is a technique as old as the subprocedure itself.
#Konrad
OOP may be flawed and it certainly is no silver bullet but it makes large-scale applications much simpler because it's a great way to reduce dependencies
That is the dogma. I am not seeing what makes OOP significantly better in this regard than procedural programming of old. Whenever I make a procedure call I am isolating myself from the specifics of the implementation.
To me, there is a lot of value in the OOP syntax itself. Using objects that attempt to represent real things or data structures is often much more useful than trying to use a bunch of different flat (or "floating") functions to do the same thing with the same data. There is a certain natural "flow" to things with good OOP that just makes more sense to read, write, and maintain long term.
It doesn't necessarily matter that an Invoice isn't really an "object" with functions that it can perform itself - the object instance can exist just to perform functions on the data without having to know what type of data is actually there. The function "invoice.toJson()" can be called successfully without having to know what kind of data "invoice" is - the result will be Json, no matter it if comes from a database, XML, CSV, or even another JSON object. With procedural functions, you all the sudden have to know more about your data, and end up with functions like "xmlToJson()", "csvToJson()", "dbToJson()", etc. It eventually becomes a complete mess and a HUGE headache if you ever change the underlying data type.
The point of OOP is to hide the actual implementation by abstracting it away. To achieve that goal, you must create a public interface. To make your job easier while creating that public interface and keep things DRY, you must use concepts like abstract classes, inheritance, polymorphism, and design patterns.
So to me, the real overriding goal of OOP is to make future code maintenance and changes easier. But even beyond that, it can really simplify things a lot when done correctly in ways that procedural code never could. It doesn't matter if it doesn't match the "real world" - programming with code is not interacting with real world objects anyways. OOP is just a tool that makes my job easier and faster - I'll go for that any day.
#CodingTheWheel
But to the extent that OOP has been a waste of time, I'd say it's because of lack of programmer training, compounded by the steep learning curve of learning a language specific OOP mapping. Some people "get" OOP and others never will.
I dunno if that's really surprising, though. I think that technically sound approaches (LSP being the obvious thing) make hard to use, but if we don't use such approaches it makes the code brittle and inextensible anyway (because we can no longer reason about it). And I think the counterintuitive results that OOP leads us to makes it unsurprising that people don't pick it up.
More significantly, since software is already fundamentally too hard for normal humans to write reliably and accurately, should we really be extolling a technique that is consistently taught poorly and appears hard to learn? If the benefits were clear-cut then it might be worth persevering in spite of the difficulty, but that doesn't seem to be the case.
#Jeff
Relative to straight procedural programming, the first fundamental tenet of OOP is the notion of information hiding and encapsulation. This idea leads to the notion of the class that seperates the interface from implementation.
Which has the more hidden implementation: C++'s iostreams, or C's FILE*s?
I think the use of opaque context objects (HANDLEs in Win32, FILE*s in C, to name two well-known examples--hell, HANDLEs live on the other side of the kernel-mode barrier, and it really doesn't get much more encapsulated than that) is found in procedural code too; I'm struggling to see how this is something particular to OOP.
I suppose that may be a part of why I'm struggling to see the benefits: the parts that are obviously good are not specific to OOP, whereas the parts that are specific to OOP are not obviously good! (this is not to say that they are necessarily bad, but rather that I have not seen the evidence that they are widely-applicable and consistently beneficial).
In the only dev blog I read, by that Joel-On-Software-Founder-of-SO guy, I read a long time ago that OO does not lead to productivity increases. Automatic memory management does. Cool. Who can deny the data?
I still believe that OO is to non-OO what programming with functions is to programming everything inline. (And I should know, as I started with GWBasic.) When you refactor code to use functions, variable2654 becomes variable3 of the method you're in. Or, better yet, it's got a name that you can understand, and if the function is short, it's called value and that's sufficient for full comprehension.
When code with no functions becomes code with methods, you get to delete miles of code.
When you refactor code to be truly OO, b, c, q, and Z become this, this, this and this. And since I don't believe in using the this keyword, you get to delete miles of code. Actually, you get to do that even if you use this.
I do not think OO is natural metaphor. I don't think language is a natural metaphor either, nor do I think that Fowler's "smells" are better than saying "this code tastes bad." That said, I think that OO is not about natural metaphors and people who think the objects just pop out at you are basically missing the point. You define the object universe, and better object universes result in code that is shorter, easier to understand, works better, or all of these (and some criteria I am forgetting). I think that people who use the customers/domain's natural objects as programming objects are missing the power to redefine the universe.
For instance, when you do an airline reservation system, what you call a reservation might not correspond to a legal/business reservation at all.
Some of the basic concepts are really cool tools I think that most people exaggerate with that whole "when you have a hammer, they're all nails" thing. I think that the other side of the coin/mirror is just as true: when you have a gadget like polymorphism/inheritance, you begin to find uses where it fits like a glove/sock/contact-lens. The tools of OO are very powerful. Single-inheritance is, I think, absolutely necessary for people not to get carried away, my own multi-inheritance software not withstanding.
What's the point of OOP? I think it's a great way to handle an absolutely massive code base. I think it lets you organize and reorganize you code and gives you a language to do that in (beyond the programming language you're working in), and modularizes code in a pretty natural and easy-to-understand way.
OOP is destined to be misunderstood by the majority of developers This is because it's an eye-opening process like life: you understand OO more and more with experience, and start avoiding certain patterns and employing others as you get wiser. One of the best examples is that you stop using inheritance for classes that you do not control, and prefer the Facade pattern instead.
Regarding your mini-essay/question
I did want to mention that you're right. Reusability is a pipe-dream, for the most part. Here's a quote from Anders Hejilsberg about that topic (brilliant) from here:
If you ask beginning programmers to
write a calendar control, they often
think to themselves, "Oh, I'm going to
write the world's best calendar
control! It's going to be polymorphic
with respect to the kind of calendar.
It will have displayers, and mungers,
and this, that, and the other." They
need to ship a calendar application in
two months. They put all this
infrastructure into place in the
control, and then spend two days
writing a crappy calendar application
on top of it. They'll think, "In the
next version of the application, I'm
going to do so much more."
Once they start thinking about how
they're actually going to implement
all of these other concretizations of
their abstract design, however, it
turns out that their design is
completely wrong. And now they've
painted themself into a corner, and
they have to throw the whole thing
out. I have seen that over and over.
I'm a strong believer in being
minimalistic. Unless you actually are
going to solve the general problem,
don't try and put in place a framework
for solving a specific one, because
you don't know what that framework
should look like.
Have you ever created a window using WinAPI?
More times than I care to remember.
Then you should know that you define a class (RegisterClass), create an instance of it (CreateWindow), call virtual methods (WndProc) and base-class methods (DefWindowProc) and so on. WinAPI even takes the nomenclature from SmallTalk OOP, calling the methods “messages” (Window Messages).
Then you'll also know that it does no message dispatch of its own, which is a big gaping void. It also has crappy subclassing.
Handles may not be inheritable but then, there's final in Java. They don't lack a class, they are a placeholder for the class: That's what the word “handle” means. Looking at architectures like MFC or .NET WinForms it's immediately obvious that except for the syntax, nothing much is different from the WinAPI.
They're not inheritable either in interface or implementation, minimally substitutable, and they're not substantially different from what procedural coders have been doing since forever.
Is this really it? The best bits of OOP are just... traditional procedural code? That's the big deal?
I agree completely with InSciTek Jeff's answer, I'll just add the following refinements:
Information hiding and encapsulation: Critical for any maintainable code. Can be done by being careful in any programming language, doesn't require OO features, but doing it will make your code slightly OO-like.
Inheritance: There is one important application domain for which all those OO is-a-kind-of and contains-a relationships are a perfect fit: Graphical User Interfaces. If you try to build GUIs without OO language support, you will end up building OO-like features anyway, and it's harder and more error-prone without language support. Glade (recently) and X11 Xt (historically) for example.
Using OO features (especially deeply nested abstract hierarchies), when there is no point, is pointless. But for some application domains, there really is a point.
I believe the most beneficial quality of OOP is data hiding/managing. However, there are a LOT of examples where OOP is misused and I think this is where the confusion comes in.
Just because you can make something into an object does not mean you should. However, if doing so will make your code more organized/easier to read then you definitely should.
A great practical example where OOP is very helpful is with a "product" class and objects that I use on our website. Since every page is a product, and every product has references to other products, it can get very confusing as to which product the data you have refers to. Is this "strURL" variable the link to the current page, or to the home page, or to the statistics page? Sure you could make all kinds of different variable that refer to the same information, but proCurrentPage->strURL, is much easier to understand (for a developer).
In addition, attaching functions to those pages is much cleaner. I can do proCurrentPage->CleanCache(); Followed by proDisplayItem->RenderPromo(); If I just called those functions and had it assume the current data was available, who knows what kind of evil would occur. Also, if I had to pass the correct variables into those functions, I am back to the problem of having all kinds of variables for the different products laying around.
Instead, using objects, all my product data and functions are nice and clean and easy to understand.
However. The big problem with OOP is when somebody believes that EVERYTHING should be OOP. This creates a lot of problems. I have 88 tables in my database. I only have about 6 classes, and maybe I should have about 10. I definitely don't need 88 classes. Most of the time directly accessing those tables is perfectly understandable in the circumstances I use it, and OOP would actually make it more difficult/tedious to get to the core functionality of what is occurring.
I believe a hybrid model of objects where useful and procedural where practical is the most effective method of coding. It's a shame we have all these religious wars where people advocate using one method at the expense of the others. They are both good, and they both have their place. Most of the time, there are uses for both methods in every larger project (In some smaller projects, a single object, or a few procedures may be all that you need).
I don't care for reuse as much as I do for readability. The latter means your code is easier to change. That alone is worth in gold in the craft of building software.
And OO is a pretty damn effective way to make your programs readable. Reuse or no reuse.
"The real world isn't "OO","
Really? My world is full of objects. I'm using one now. I think that having software "objects" model the real objects might not be such a bad thing.
OO designs for conceptual things (like Windows, not real world windows, but the display panels on my computer monitor) often leave a lot to be desired. But for real world things like invoices, shipping orders, insurance claims and what-not, I think those real world things are objects. I have a stack on my desk, so they must be real.
The point of OOP is to give the programmer another means for describing and communicating a solution to a problem in code to machines and people. The most important part of that is the communication to people. OOP allows the programmer to declare what they mean in the code through rules that are enforced in the OO language.
Contrary to many arguments on this topic, OOP and OO concepts are pervasive throughout all code including code in non-OOP languages such as C. Many advanced non-OO programmers will approximate the features of objects even in non-OO languages.
Having OO built into the language merely gives the programmer another means of expression.
The biggest part to writing code is not communication with the machine, that part is easy, the biggest part is communication with human programmers.