How to identify what is not a Class? - language-agnostic

I know the rule of thumb is that a noun used by the user is potentially a class. Similarly, a verb may be made into an action class e.g. predicate
Given a description from the user, how do you -
identify what is not not to be made into a class

The only real answer is experience. However, some things fairly obviously (to me, anyway) cannot be modelled in your design. For example if the use case says:
"and then the parcel is put on the UPS van"
There is no need to model the van. You can make decisions of this kind by considering the system boundaries - you don't and can't control the van. However,
"we make a request to UPS for pickup"
might well result in a UPSPickup object.

The rules are simple.
Everything is an object.
All objects belong to classes.
In rare (very rare circumstances) you have some specialized class/object confusion.
A "library" of all static methods. This is an implementation choice, and no user can see this.
A Singleton where there can only be one object of the given class. This does happen sometimes.

In an OO language it is not a question of what to be made into a class, but rather 'what class does this data/functionality go into?'
Like other software architecture aspects there are rules, but ultimately it is an art that requires experience. There are lots of books on software design, but a simple reference is Coupling and Cohesion.
Cohesion of a single module/component is the degree to which its responsibilities form a meaningful unit; higher cohesion is better.
Coupling between modules/components is their degree of mutual interdependence; lower coupling is better.

Related

Has the word Abstraction other interpretations in computing?

I am quite new in programming and what haunts me about it is not really the coding (well at least not until the present moment!) itself, but some words/concepts that are really important to understand. My doubt is with the word "ABSTRACTION". I have already searched dictionaries and saw some videos of people giving very clear explanations of the word. So, I know that abstraction is when you take into consideration only the things that are important and leave out everything else (putting in very simple and direct language), like for instance, if you are going to change a light bulb, you do not need to know the manufacturer of the light bulb or the light socket. You also do not need to know the materials used to manufacture the light bulb. However, the problem is when you read some texts or listen to people using the word and it does not seem to fit the meaning and then you start to wonder if they misused the word (which I think is very unlikely) or it is because there is another obscure meaning that I have not found yet or maybe it is just because I am too dumb to understand it. Below I put excerpts from articles I was reading and bolded and capitalized the part where the word appears so you guys have a context and understand where my problem is. Thank you.
"A paradigm programming provides and determines the view that the programmer has on the structuring and execution of the programme. For example, in object-oriented programming, programmers MAY ABSTRACT A PROGRAMME AS A COLLECTION OF OBJECTS that interact with each other, while in functional programming, programmers ABSTRACT THE PROGRAMME as a sequence of functions executed in a stacked fashion."
"A tuple space has the function of creating a SHARED MEMORY ABSTRACTION over a distributed system, where everyone can read and write to it."
It's easy to understand if you replace abstract/abstraction with one of its synonyms conceptualize/conceptualization. In your first two examples "abstract a programme" means "think of a programme as"... or "conceptualize a programme as"... When we make an abstraction we forget about some details, and think about that thing in other terms.
Side advice from a fellow beginner:
As someone who started learning computer science independently less than a year ago, I can tell you right now there will be lots of tricky terms like this. Try not to get too caught up in them. Often times if you just keep learning, you'll experience first hand what these terms mean without even realizing it. Bits and pieces will add up. The takeaway from this being, don't let what you don't know slow you down. Sometimes it's ok to keep going and just not know for a while.
These seem to fit the definition you put up earlier. For object oriented programming, the mindset is to consider "objects" as the essential (important) aspect of a program and abstract all other considerations away. Same thing for functional programming where "functions" are the defining aspect abstracting other considerations as secondary.
The tuple space may be a little trickier but if you consider that variations in memory storage models are abstracted away in favour of a higher level concept focusing on a collection of values, then you see what the abstraction relates to.
Abstract
adjective
existing in thought or as an idea but not having a physical or concrete existence.
relating to or denoting art that does not attempt to represent external reality, but rather seeks to achieve its effect using shapes, colours, and textures.
verb
consider something theoretically or separately from (something else).
extract or remove (something).
noun
a summary of the contents of a book, article, or speech.
an abstract work of art.
There you have your answer. Ask 100 people what an abstract painting is, you will get at least 100 answers. Why should programmers behave differently?
Lets see what Oracle has to say about abstract classes:
Abstract classes are similar to interfaces. You cannot instantiate them, and they may contain a mix of methods declared with or without an implementation. However, with abstract classes, you can declare fields that are not static and final, and define public, protected, and private concrete methods.
Consider using abstract classes if any of these statements apply to your situation:
You want to share code among several closely related classes.
You expect that classes that extend your abstract class have many common methods or fields, or require access modifiers other than public (such as protected and private).
You want to declare non-static or non-final fields. This enables you to define methods that can access and modify the state of the object to which they belong.
Compare that with the definition of abstract in the above section. I think you get a pretty good idea of abstractness in computer programming.

open closed principle - refactoring to create base class based on new features

So when original code was written there was only a need for say LabTest class. But now say we have new requirements to add say RadiologyTest, EKGTest etc.
These classes have a lot in common hence it makes sense to have a base class.
But that will mean LabTest class will have to be modified, lets say its interface will remain same as before, in other words consumers of LabTest class will not need change.
Is this violation of open closed principle principle? (LabTest is being modified).
I think you can look at it from two perspectives: existing requirements and new requirements.
If the existing requirements didn't cover the need for these kinds of changes then I'd say, based on those requirements, LabTest did not violate OCP.
With the new requirements, you need to add functionality that does not fit with the LabTest implementation. Adding it to OCP would violate SRP. The requirements now create a new change vector that will force you to refactor LabTest to keep it OCP. If you fail to refactor LabTest it will violate SRP and OCP. When you refactor, keep in mind the new change vector in any classes you create or modify.
These classes have a lot in common hence it makes sense to have a base class.
I think you may be violating SRP. After all, if each class does one task, how can two or more be so similar? If there's a task they both do identically, then that is a separate task and should be done by another class.
So I would say, first refactor LabTest into it's constituent parts (hope you've got unit tests!). Then when you come to write RadiologyTest, EKGTest they can reuse the parts that make sense for them. This is also known as composition over inheritance.
But whatever you do, do use interfaces to these classes in the client. Don't force those who follow to use your base classes to add extensions.
I may get burnt for this answer, but going on a limb anyways.
In my opinion(IMO), OCP cannot be followed in the purist sense like other principles such as SRP, DIP or ISP.
If requirements change in such a way that you have to change the responsibility of a class to be true to their representation of the domain model, then we have to change that class.
IMO, OCP stops us from re-factoring code to follow the evolution of the system.
Please correct me if I am wrong.
Update:
After further research, this is what I am thinking:
Lets say, I have automated test both on unit level and integration level, then IMO we should redesign the complete system to fit the new model, OCP is out the door here.
IMO, the goal of a system evolution is always to avoid hacks(not changing LabTest class and the corresponding DB table so as to not break old code[not violate OCP], and using LabTest to store EKGTest's common data and using LabTest inside of EKGTest or EKGTest inheriting from LabTest will be a hack, IMO) will be and to make the system represent its model as accurate as possible.
I think the Open-Closed Principle, (as outlined by Uncle Bob anyway, vs Bertrand Meyers) isn't about never modifying classes (if software was never going to change it might as well be hardware).
& in your own case, I don't think you're violating OCP as you've mentioned all of your uses of your class are depending on the abstraction of LabTest rather than the implementation of RadiologyTest.
From Uncle Bob's introductory paper, he has an example of a DrawAllShapes class that if designed to OCP, shouldn't need to change each time a new subclass of Shape is added to the system. Regarding to what level you apply it, Uncle Bob says that —
It should be clear that no significant program can be 100% closed. For
example, consider what would happen to the DrawAllShapes function from
Listing 2 if we decided that all Circles should be drawn before any
Squares. The DrawAllShapes function is not closed against a change
like this. In general, no matter how “closed” a module is, there will
always be some kind of change against which it is not closed.
Since closure cannot be complete, it must be strategic. That is, the
designer must choose the kinds of changes against which to close his
design.
I wouldn't read "closed for modification" as "don't refactor", more that you should design you classes in such a way that other classes can't make modifications which will affect you — e.g. applying the basic OO stuff — encapsulation via getters/setters & private member variables.

Creative Terminology

I seem to use bland words such as node, property, children (etc) too often, and I fear that someone else would have difficulty understanding my code simply because the parts' names are vague, common words.
How do you find creative names for classes and components to make them more memorable?
I am particularly having trouble with generic tools which have no real description except their rather generic functional purpose. I would like to know if others have found creative ways to name things rather than simply naming them by their utility, such as AnonymousFunctionWrapperCallerExecutorFactory.
It's hard to answer. I find them just because they seem to 'fit'.
What I do know, however, is that I find it basically impossible to move on writing code unless something is named correctly, and it 'feels' good. If it isn't named right, I find it hard to use, and the code is generally confusing.
I'm not too concerned about something being 'memorable', only 'accurate'.
I have been known to sit around thinking out loud about what to name something. Take your time, and make sure you are really happy with the name. don't be afraid of using common/simple words.
I don't really have an answer, but three things for you to think about.
The late Phil Karlton famously said: "There are only two hard problems in computer science. Cache Invalidation and Naming Things." So, the fact that you are having trouble coming up with good names is entirely normal and even expected.
OTOH, having trouble naming things can also be a sign of bad design. (And yes, I am perfectly aware, that #1 and #2 contradict each other. Or maybe one should think of it more like balancing each other.) E.g., if a thing has too many responsibilities, it is pretty much impossible to come up with a good name. (Witness all the "Service", "Util", "Model" and "Manager" classes in bad OO designs. Here's an example Google Code Search for "ManagerFactoryFactory".)
Also, your names should map to the domain jargon used by subject matter experts. If you can't find a subject matter expert, that's a sign that you are currently worrying about code that you're not supposed to worry about. (Basically, code that implements your core business domain should be implemented and designed well, code in ancillary domains should be implemented and designed so-so, and all other code should not be implemented or designed at all, but bought from a vendor, where what you are buying is their core business domain. [Please interpret "buy" and "vendor" liberally. Community-developed Free Software is just fine.])
Regarding #3 above, you mentioned in another comment that you are currently working on implementing a tree data structure. Unless your company is in the business of selling tree data structures, that is not a part of your core domain. And the reason that you have trouble finding good names could be that you are working outside your core domain. Now, "selling tree data structures" may sound stupid, but there are actually companies that do that. For example, the BCL team inside Microsoft's developer division: they actually sell (well, for certain definitions of "sell", anyway) the .NET framework's Base Class Libraries, which include, among others, tree data structures. But note that for example Microsoft's C++ compiler team actually (literally) buys their STL from a third-party vendor – they figure that their core domain is writing compilers, and they leave the writing of libraries to a company who considers writing STLs their core domain. (And indeed, AFAIK, that company does nothing but write and sell STL implementations. That's their sole product.)
If, however, selling tree data structures is your core domain, then the names you listed are just fine. They are the names that subject matter experts (programmers, in this case) use when talking about the domain of tree data structures.
Using 'metaphors' is a common theme in agile (and pattern) literature.
'Children' (in your question) is an example of a metaphor that is extensively used and for good reasons.
So, I'd encourage the use of metaphors, provided they are applicable and not a stretch of the imagination.
Metaphors are everywhere in computing. From files to bugs to pointers to streams... you can't avoid them.
I believe that for the purpose of standardization and communication, it's good to use a common vocab, like in the same case for design patterns. I have a problem with a programmer who keeps 'inventing' his own terms and I have trouble understanding him. (He kept using the term 'events orchestrating' instead of 'scripting' or 'FCFS process'. Kudos for creativity though!)
Those common vocab describe stuff we are used to. A node is a point, somewhere in a graph, in a tree, or what-not. One way is to be specific to the domain. If we are doing a mapping problem, instead of 'node', we can use 'location'. That helps in a sense, at least for me. So I find there is a need to balance being able to communicate with other programmers, and at the same time keeping the descriptor specific enough to help me remember what it does.
I think node, children, and property are great names. I can already guess the following about your classes, just by their "bland" names:
Node - this class is part of a graph of objects
children - this variable holds a list of nodes belonging to the containing node.
I don't think "node" is either vague or common, and if you're coding a generic data structure, it's probably ok to have generic names! (With that being said, if you are coding up a tree, you could use something like TreeNode to emphasize that the node is part of a tree.) One way you can make the life of developers who will use your API easier is to follow the naming conventions of your platform's built in libraries. If everyone calls a node a node, and an iterator an iterator, it makes life easy.
Names that reflect the purpose of the class, method or property are more memorable than creative ones. Modern IDEs make it easier to use longer names so feel fee to be descriptive. Getting creative won't help as much as getting accurate.
I recommend to pick nouns from a specific application domain. E.g. if you are putting cars in a tree, call the node class Car - the fact that it is also a node should be apparent from the API. Also, don't try to be too generic in your implementation - don't put all attributes of the car into a hashtable named properties, but create separate attributes for make, color, etc.
A lot of languages and coding styles like to use all sorts of descriptive prefixes. In PHP there are no clear types, so this may help greatly. Instead of doing
$isAvailable = true;
try
$bool_isAvailable = true;
It is admittedly a pain, but usually well worth the time.
I also like to use long names to describe things. It may seem strange, but is usually easier to remember, especially when I go back to refactor my code
$leftNode->properties < $leftTreeNode->arrayOfNodeProperties;
And if all else fails. Why not fall back on a solid star wars themed program.
$luke->lightsaber($darth[$ewoks]);
And lastly, in college I named my classes after my professor, and then my class methods all the things I wanted to do to that jerk.
$Kube->canEat($myShorts, $withKetchup);

How often do you need to create a real class hierarchy in your day to day programming?

I create business applications with heavy database use. Most of the programming work is just to connect components to the database and modifying components to adapt to general interface behaviour. I mostly use Delphi with its rich VCL library, and generally buy components needed. I keep most of the business logic in the database. I rarely get the chance to build a nice class hierarchy from the bottom up as there really is no need. Anyone else have this experience?
For me, occasionally a problem is clearer or easier with subclassing, but not often.
This also changes quite a bit in a given design as it's refactored.
My biggest problem is that programming courses and texts give so much weight to inheritance, hierarchies, and polymorphism through base classes (vs. interfaces or dynamic typing). This helps create legions of programmers that subclass everything and their mother.
The answer to this question is not totally language-agnostic;
Some languages like Java have a fairly limited set of language features available, meaning that subclassing is fairly often used because it's a convenient method for re-use, technical inheritance.
Closures and lambdas of C# make inheritance for technical reasons much less relevant. So normally inheritance is used for semantic reasons (like cat extends animal).
The last C# project I worked on, we more or less made all of the class hierarchies within a few weeks. After that it was more or less over.
On my current java project we create new class hierarchies all of the time.
Other languages will have other features that similarly affect this composition (mixins come to mind)
I put on my architecting/class design hat probably once or twice a month. It's probably the best hat I have and is the most fun to wear.
Depends what stage of the lifecycle your project is in though.
When your tackling problem domains you are well familiar with and already have a common code base to work from, you often have no need to create a new class hierarchy. It's when you stumble upon problems you have no ready solutions for, that you start building your own.
It's also very dependant on the type of applications you develop. If your domain already has well accepted conventions and libraries to work from, there probably isn't any need to reinvent the wheel (other than personal / academic interests). Some areas have inherently less available resources to work with, and in those you'll find yourself building everything from scratch most of the time.
A majority of applications, especially business applications, contains at least some kind of business logic in it. I would contend that business should not be in the database, but should rather be in the application. You can put referential integrity in the database as I think this is a good choice, but business logic should be only in the application.
By class hierarchy, I suppose you mean do you always have to end up with some inheritance in your object model, then the answer is no. But chances are you can often find some common code, factor it out and create a base class to contain the common code.
If you agree with me on the point that business logic should not be in the database, but should be in the application, then I recommend you look into the MVC Design Pattern to guide your design. You will find your design contain classes or objects. Your VCLs will represent your View, and you can have your Model classes map directly to the database table, i.e. each member in the class in the model corresponds to a field in a database table (again, this is the norm but there will be exception, where this simplicity fails to apply). Then you'll need a layer to handle the CRUD (Create, Read, Update, Delete) of the Model classes to the database tables. You will end up with an "layered" application that is easier to maintain and enhance.
It depends on what you mean by hierarchy - inheritance or layering?
When object oriented languages first came out, inheritance was overused. Complicated hierarchies were common. Now, interfaces (as in Java and C#) provide a simpler way to get the benefit of polymorphism without the complications of inheritance. I rarely use inheritance anymore.
Layering, however, is vital when creating a large application. Layering prevents general low-level classes (like lists) from directly referencing specific high-level classes (like web browser windows). As far as I know, there isn't a formal way to describe layering, but there are general guidelines (model-view-controller (MVC), separate GUI logic from business logic, separate data from presentation, etc.).
It really depends on the types/phases of the projects you're working on. I happen to do that everyday because I'm working on database internals for a new database, creating related libraries/frameworks. I'd imagine doing that a lot less if I'm working within a mature framework using other people's libraries.
I'm doing Infrastructure for our companys' product, so I'm writing a lot of code that will be used later by guys in other teams. So I end up writing lots of abstract classes, interfaces, hierarchies and so on. Mostly it's just a pattern of "default behaviour in an abstract/virtual class, which other programmers may override".
Very challenging, I must say.
The time that I find class hierarchies most beneficial is when the relationship between objects actually does match a true "is-a" relationship in the domain.
However if I can avoid large hierarchies I will due to the fact that they are often a little more tricky to map to relational databases and can really complicate your database designs. Since you say most of your applications make heavy use of databases this would be something to take into consideration.

Object Normalization

In the same line as Database Normalization - is there an approach to object normalization, not design pattern, but the same mathematical like approach to normalizing object creation. For example: first normal form: no repeating fields....
here's some links to DB Normalization:
http://en.wikipedia.org/wiki/Database_normalization
http://databases.about.com/od/specificproducts/a/normalization.htm
Would this make object creation and self-documentation better?
Here's a link to a book about class normalization (guess we're really talking about classes)
http://www.agiledata.org/essays/classNormalization.html
Normalization has a mathematical foundation in predicate logic, and a clear and specific goal that the same piece of information never be represented twice in a single model; the purpose of this goal is to eliminate the possibility of inconsistent information in a data model. It can be shown via mathematical proof that if a data model has certain specific properties (that it passes tests for 1st Normal Form (1NF), 2NF, 3NF, etc.) that it is free from redundant data representation, i.e. it is Normalized.
Object orientation has no such underlying mathematical basis, and indeed, no clear and specific goal. It is simply a design idea for introducing more abstraction. The DRY principle, Command-Query Separation, Liskov Substitution Principle, Open-Closed Principle, Tell-Don't-Ask, Dependency Inversion Principle, and other heuristics for improving quality of code (many of which apply to code in general, not just object oriented programs) are not absolute in nature; they are guidelines that programmers have found useful in improving understandability, maintainability, and testability of their code.
With a relational data model, you can say with absolute certainty whether it is "normalized" or not, because it must pass ALL the tests for normal form, and they are quite specific. With an object model, on the other hand, because the goal of "understandable, maintainable, testable, etc" is rather vague, you cannot say with any certainty whether you have met that goal. With many of the design heuristics, you cannot even say for sure whether you have followed them. Have you followed the DRY principle if you're applying patterns to your design? Surely repeated use of a pattern isn't DRY? Furthermore, some of these heuristics or principles aren't always even necessarily good advice all the time. I do try to follow Command-Query Separation, but such useful things as a Stack or a Queue violate that concept in order to give us a rather elegant and useful result.
I guess the Single Responsible Principle is at least related to this. Or at least, violation of the SRP is similar to a lack of normalization in some ways.
(It's possible I'm talking rubbish. I'm pretty tired.)
Interesting.
You may also be interested in looking at the Law of Demeter.
Another thing you may be interested in is c2's FearOfAddingClasses, as, arguably, the same reasoning that lead programmers to denormalise databases also leads to god classes and other code smells. For both OO and DB normalisation, we want to decompose everything. For databases this means more tables, for OO, more classes.
Now, it is worth bearing in mind the object relational impedance mismatch, that is, probably not everything will translate cleanly.
Object relational models or 'persistence layers', usually have 1-to-1 mappings between object attributes and database fields. So, can we normalise? Say we have department object with employee1, employee2 ... etc. attributes. Obviously that should be replaced with a list of employees. So we can say 1NF works.
With that in mind, let's go straight for the kill and look at 6NF database design, a good example is Anchor Modeling, (ignore the naming convention). Anchor Modeling/6NF provides highly decomposed and flexible database schemas; how does this translate to OO 'normalisation'?
Anchor Modeling has these kinds of relationships:
Anchors - unique object IDs.
Attributes, which translate to object attributes: (Anchor, value, metadata).
Ties - relationships between two or more objects (themselves anchors): (Anchor, Anchor... , metadata)
Knots, attributed Ties.
Attribute metadata can be anything - who changed an attribute, when, why, etc.
The OO translation of this is looks extremely flexible:
Anchors suggest attribute-less placeholders, like a proxy which knows how to deal with the attribute composition.
Attributes suggest classes representing attributes and what they belong to. This suggests applying reuse to how attributes are looked up and dealt with, e.g automatic constraint checking, etc. From this we have a basis to generically implement the GOF-style Structural patterns.
Ties and Knots suggest classes representing relationships between objects. A basis for generic implementation of the Behavioural design patterns?
Interesting and desirable properties of Anchor Modeling that also translate across are:
All this requires replacing inheritance with composition (good) in the exposed objects.
Attribute have owners rather than owners having attributes. Although this make attribute lookup more complex, it neatly solves certain aliasing problems, as there can only ever be one owner.
No need for NULL. This translates to clearer NULL handling. Empty-case attribute classes could provide methods for handling the lack of a particular attribute, instead of performing NULL-checking everywhere.
Attribute metadata. Attribute-level full historisation and blaming: 'play' objects back in time, see what changed, when and why, etc. (if required - metadata is entirely optional)
There would probably be a lot of very simple classes (which is good), and a very declarative programming style (also good).
Thanks for such a thought provoking question, I hope this is useful for you.
Perhaps you're taking this from a relational point-of-view, but I would posit that the principles of interfaces and inheritance correspond to normalization in the world of OOP.
For example, a Person abstract class containing FirstName, LastName, Gender and BirthDate can be used by classes such as Employee, User, Member etc. as a valid base class, without a need to repeat the definitions of those attributes in such subclasses.
The principle of DRY, (a core principle of Andy Hunt and Dave Thomas's book The Pragmatic Programmer), and the constant emphasis of object-oriented programming on re-use, also correspond to the efficiencies offered by Normalization in relational databases.
At first glance, I'd say that the objectives of Code Refactoring are similar in an abstract way to the objectives of normalization. But that's pretty abstract.
Update: I almost wrote earlier that "we need to get Jon Skeet in on this one." I posted my answer and who beat me? You guessed it...
Object Role Modeling (not to be confused with Object Relational Mapping) is the closest thing I know of to normalization for objects. It doesn't have as mathematical a foundation as normalization, but it's a start.
In a fairly ad-hoc and untutored fashion, that will probably cause purists to scoff, and perhaps rightly, I think of a database table as being a set of objects of a particular type, and vice versa. Then I take my thoughts from there. Viewed this way, it doesn't seem to me like there's anything particularly special you have to do to use normal form in your everyday programming. Each object's identity will do for starters as its primary key, and references (pointers, etc.) will do by way of foreign keys. Then just follow the same rules.
(My objects usually end up in 3NF, or some approximation thereof. I treat this all more as guidelines, and, like I said, "untutored".)
If the rules are followed properly, each bit of information then ends up in one place, the interrelationships are clear, and everything is structured such that introducing inconsistencies takes some work. One could say that this approach produces good results on this basis, and I would agree.
One downside is that the end result can feel a bit like a tangle of spaghetti, particularly after some time away, and it's hard to shake the constant lingering sensation, even though it's usually false, that surely a few of all these links could be removed...
Object oriented design is rational but it does not have the same mathematically well-defined basis as the Relational Model. There is nothing exactly equivalent to the well-defined normal forms of database design.
Whether this is a strength or a weakness of Object oriented design is a matter of interpretation.
I second the SRP. The Open Closed Principle applies as well to "normalization" although I might stretch the meaning of the word, in that it should be possible to extend the system by adding new implementations, without modifying the existing code. objectmentor about OCP
good question, sorry i can't answer in depth
I've been working on object normalization off and on for over 20 years. It's deep and complicated and beautiful, and is the subject of my second planned book, Object Mechanics II. ONF = Object Normal Form, you heard it here first! ;-)
since potentially patentable technology lurks within, I am not at liberty to say more, except that normalizing the data is the really easy part ;-)
ADDENDUM: change of plans - see https://softwareengineering.stackexchange.com/questions/84598/object-oriented-normalization