Metrics for measuring successful refactoring [closed] - language-agnostic

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Are there objective metrics for measuring code refactoring?
Would running findbugs, CRAP or checkstyle before and after a refactoring be a useful way of checking whether the code was actually improved rather than just changed?
I'm looking for metrics that can be can determined and tested for to help improve the code review process.

Number of failed unittests must be less or equal to zero :)

Depending on your specific goals, metrics like cyclomatic complexity can provide an indicator for success. In the end every metric can be subverted, since they cannot capture intelligence and/or common sense.
A healthy code review process might do wonders though.

Would running findbugs, CRAP or checkstyle before and after a refactoring be a useful way of checking if the code was actually improved rather than just changed?
Actually, as I have detailed in the question "What is the fascination with code metrics?", the trend of any metrics (findbugs, CRAP, whatever) is the true added value of metrics.
It (the evolution of metrics) allows you to prioritize the main fixing action you really need to make to your code (as opposed to blindly try to respect every metric out there)
A tool like Sonar can, in this domain (monitoring of metrics) can be very useful.
Sal adds in the comments:
The real issue is on checking what code changes add value rather than just adding change
For that, test coverage is very important, because only tests (unit tests, but also larger "functional tests") will give you a valid answer.
But refactoring should not be done without a clear objective anyway. To do it only because it would be "more elegant" or even "easier to maintain" may be not in itself a good reason enough to change the code.
There should be other measures like some bugs which will be fixed in the process, or some new functions which will be implemented much faster as a result of the "refactored" code.
In short, the added value of a refactoring is not solely measured with metrics, but should also be evaluated against objectives and/or milestones.

Code size. Anything that reduces it without breaking functionality is an improvement in my book (removing comments and shortening identifiers would not count, of course)

No matter what you do just make sure this metric thing is not used for evaluating programmer performance, deciding promotion or anything like that.

I would stay away from metrics for measuring refactoring success (aside from #unit test failures == 0). Instead, I'd go with code reviews.
It doesn't take much work to find obvious targets for refactoring: "Haven't I seen that exact same code before?" For the rest, you should create certain guidelines around what not to do, and make sure your developers know about them. Then they'll be able to find places where the other developer didn't follow the standards.
For higher-level refactorings, the more senior developers and architects will need to look at code in terms of where they see the code base moving. For instance, it may be perfectly reasonable for the code to have a static structure today; but if they know or suspect that a more dynamic structure will be required, they may suggest using a factory method instead of using new, or extracting an interface from a class because they know there will be another implementation in the next release.
None of these things would benefit from metrics.

Yes, several measures of code quality can tell you if a refactoring improves the quality of your code.
Duplication. In general, less duplication is better. However, duplication finders that I've used sometimes identify duplicated blocks that are merely structurally similar but have nothing to do with one another semantically and so should not be deduplicated. Be prepared to suppress or ignore those false positives.
Code coverage. This is by far my favorite metric in general, but it's only indirectly related to refactoring. You can and should raise low coverage by writing more tests, but that's not refactoring. However, you should monitor code coverage while refactoring (as with any other change to the code) to be sure it doesn't go down. Refactoring can improve code coverage by removing untested copies of duplicated code.
Size metrics such as lines of code, total and per class, method, function, etc. A Jeff Atwood post lists a few more. If a refactoring reduces lines of code while maintaining clarity, quality has increased. Unusually long classes, methods, etc. are likely to be good targets for refactoring. Be prepared to use judgement in deciding when a class, method, etc. really does need to be longer than usual to get its job done.
Complexity metrics such as cyclomatic complexity. Refactoring should try to decrease complexity and not increase it without a well thought out reason. Methods/functions with high complexity are good refactoring targets.
Robert C. Martin's package-design metrics: Abstractness, Instability and Distance from the abstractness-instability main sequence. He described them in his article on Stability in C++ Report and his book Agile Software Development, Principles, Patterns, and Practices. JDepend is one tool that measures them. Refactoring that improves package design should minimize D.
I have used and continue to use all of these to monitor the quality of my software projects.

There are two outcomes you want from refactoring. You want to team to maintain sustainable pace and you want zero defects in production.
Refactoring takes place on the code and the unit test build during Test Driven Development (TDD). Refactoring can be small and completed on a piece of code necessary to finish a story card. Or, refactoring can be large and required a technical story card to address technical debt. The story card can be placed on the product backlog and prioritized with the business partner.
Furthermore, as you write unit tests as you do TDD, you will continue to refactor the test as the code is developed.
Remember, in agile, the management practices as defined in SCRUM will provide you collaboration and ensure you understand the needs of the business partner and the code you have developed meets the business need. However, without proper engineering practices (as defined by Extreme Programming) you project will loss sustainable pace. Many agile project that did not employ engineering practices were in need of rescue. On the other hand, a team that was disciplined and employed both management and engineering agile practice were able to sustain delivery indefinitely.
So, if your code is released with many defects or your team looses velocity, refactoring, and other engineering practices (TDD, paring, automated testing, simple evolutionary design etc), are not being properly employed.

I see the question from the smell point of view. Smells could be treated as indicators of quality problems and hence, the volume of identified smell instances could reveal the software code quality.
Smells can be classified based on their granularity and their potential impact. For instance, there could be implementation smells, design smells, and architectural smells. You need to identify smells at all granularity levels before and after to show the gain from a refactoring exercise. In fact, refactoring could be guided by identified smells.
Examples:
Implementation smells: Long method, Complex conditional, Missing default case, Complex method, Long statement, and Magic numbers.
Design smells: Multifaceted abstraction, Missing abstraction, Deficient encapsulation, Unexploited encapsulation, Hub-like modularization, Cyclically-dependent modularization, Wide hierarchy, and Broken hierarchy. More information about design smells can be found in this book.
Architecture smells: Missing layer, Cyclical dependency in packages, Violated layer, Ambiguous Interfaces, and Scattered Parasitic Functionality. Find more information about architecture smells here.

Related

The Implications of Modern Day Software Development Abstractions [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am currently doing a dissertation about the implications or dangers that today's software development practices or teachings may have on the long term effects of programming.
Just to make it clear: I am not attacking the use abstractions in programming. Every programmer knows that abstractions are the bases for modularity.
What I want to investigate with this dissertation are the positive and negative effects abstractions can have in software development. As regards the positive, I am sure that I can find many sources that can confirm this. But what about the negative effects of abstractions? Do you have any stories to share that talk about when certain abstractions failed on you?
The main concern is that many programmers today are programming against abstractions without having the faintest idea of what the abstraction is doing under-the-covers. This may very well lead to bugs and bad design. So, in you're opinion, how important is it that programmers actually know what is going below the abstractions?
Taking a simple example from Joel's Back to Basics, C's strcat:
void strcat( char* dest, char* src )
{
while (*dest) dest++;
while (*dest++ = *src++);
}
The above function hosts the issue that if you are doing string concatenation, the function is always starting from the beginning of the dest pointer to find the null terminator character, whereas if you write the function as follows, you will return a pointer to where the concatenated string is, which in turn allows you to pass this new pointer to the concatenation function as the *dest parameter:
char* mystrcat( char* dest, char* src )
{
while (*dest) dest++;
while (*dest++ = *src++);
return --dest;
}
Now this is obviously a very simple as regards abstractions, but it is the same concept I shall be investigating.
Finally, what do you think about the issue that schools are preferring to teach Java instead of C and Lisp ?
Can you please give your opinions and your says as regards this subject?
Thank you for your time and I appreciate every comment.
First of all, abstractions are inevitable because they help us to deal with the mind-blowing complexity of things.
Abstractions are also inevitable because it is more and more required of an individual to undertake more tasks or even complete projects. To address the problem, one uses libraries which wrap lower-level concepts and expose more complex behavior.
Naturally, a developer has less and less time to know the intrinsics of the things. The latest concern I heard about on SO pages is starting to learn JavaScript with jQuery library, ignoring the raw JavaScript at all.
The issue is about the balance between:
Know the little tiniest details of some technology and be a master of it, but at the same time being unable to work with anything else.
Superficial knowledge of a wide variety of technologies and tools which however proves sufficient for common everyday tasks which allows an individual to perform in multiple areas possibly covering all sides of some (moderately big) project.
Take your pick.
Some work requires the one, another position requires the other.
So, in you're opinion, how important is it that programmers actually know what is going below the abstractions?
It would be nice if people knew what is happening behind the scenes. This knowledge comes with time and practice, up to a certain degree. Depends on what kind of tasks you have. You certainly shouldn't blame people for not knowing everything. If you wish a person to be able to perform in a variety of fields, it is inevitable he won't have time to cover each up to the last bit.
What is essential, is the knowledge of the basic building blocks. Data structures, algorithms, complexity. That should provide a basis for everything else.
Knowing tiniest details of some particular technology is good, but not essential. Anyway, you can't learn them all. They're too many and they keep coming.
Finally, what do you think about the issue that schools are preferring to teach Java instead of C and Lisp ?
Schools shouldn't be teaching programming languages at all. They're to teach basics of theoretical and practical CS, social skills, communication, team work. To cover a vast variety of topics and problems to provide a wide angle view for their graduates. This will help them to find their way. Whatever they need to know in details, they'll do it on their own.
An example where abstraction has failed:
In this case, a piece of software was needed to communicate to many different third party data processors. The communication was done through various messaging protocols; the transport method/protocol is not important in this case. Just assume everyone communicated through messaging.
The idea was to abstract the features of each of these third parties into a single, unified message format. It seemed relatively straightforward because each of the third parties performed a similar service. The problem was that some third parties used different terms to explain similar features. It was also found that some third parties had additional features that other third parties did not have.
The designers of the abstraction did not see through the difference of third party terms nor did they think it was reasonable to limit the scope of the unified features to only support the common features of the third parties. Instead, a single, monolithic message schema was developed to support any and all features of the third parties considered at the time. In what was probably considered a future-proofing move, they added a means of also passing an infinite number of name/value pairs along with the monolithic message in case there were future data elements that the monolithic message could not handle.
Early on, it became clear that changing the monolithic message was going to be difficult due to so many people using it in mission critical systems. The use of the name/value pairs increased. Each name that could be used was documented inside a large spreadsheet, and developers were required to consult the spreadsheet to avoid duplication of name value function. The list got so large, however, that it was found that there were frequently collisions in purposes of name values.
The majority of the monolithic message's fields now have no purpose and are kept mainly for backwards compatibility. There are name values that can be used to replace fields in the monolithic message. The majority of the interfacing is now done through the name/value pairs. In cases where the client is intending to communicate with more than one third party, each client needs to reconcile the name values available for each third party. It would be almost simpler to interface directly to the third party themselves.
I believe this illustrates that, from a consumer of the monolithic message perspective, that it is important that developers of the consuming code not know what is happening under the covers. If the designers had considered that the consumers of the monolithic message should not have to understand the abstraction in great detail, the monolithic message and it's associated name/value pairs might never have happened. Documenting the abstraction with assertions regarding input and expected output would make life so much simpler.
As for colleges not teaching C and Lisp....they are cheating the students. You get a better understanding of what is going on with the machine and OS with C. You get a bit of a different perspective on processing data and approaching problems with Lisp. I have used some of the ideas I learned using Lisp in programs written in C, C++, .Net, and Java. Learning Java after knowing even just C is not very difficult. The OO part is really not programming language specific, so perhaps using Java for that is acceptable.
An understanding of fundamentals of algorithms (e.g. time complexity) and some knowledge about the metal is essential to designing/writing smells-good code.
I would suggest, though, that just as important is education in modern abstractions and profiling. I feel that modern abstractions make me so much more productive than I would be without them that they are at least as important as good fundamentals, if not more so.
An important element that lacked in my education was the use of profilers. When used routinely and correctly, profilers can help mitigate problems with poor fundamentals.
Since you quote Joel Spolsky, I take it your aware of his "Law of Leaky Abstractions"? I'll mention it for future readers. http://www.joelonsoftware.com/articles/LeakyAbstractions.html
Green & Blackwell's Ironies of Abstractions talks a bit about the effort of learning the abstraction. http://homepage.ntlworld.com/greenery/workStuff/Papers/index.html
The term "astronaut architecture" is a reaction to over-abstraction.
I know I certainly curse abstraction when I haven't touched Java or C# in a while and i want to write to a file, but have to instance a Stream...Writer...Adaptor....Handler....
Also, Patterns, as in Gang Of Four. Seemed great when I first read about them in the mid-90's, but can never remember factory, facade, interface, helper, worker, flyweight....

What are some advanced software development topics every developer should know? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Let's say your company has given you the time & money to acquire training on as many advanced programming topics that you can eat in a year, carte blanche. What would those topics be and how would you prefer to acquire them?
Assumptions:
You're still having deliverables to bring into existence, but you're allowed one week per month for the year for this training.
The training can come from anywhere. IE: Classroom, on-site instructor, books, subscriptions, podcasts, etc.
Subject matter can cover any platform, technology, language, DBMS, toolset, etc.
Concurrent/Parallel programming and multi-threading, especially with respect to memory models and memory coherency.. I think every programmer should be aware of the considerations in this arena as we move into a world of multi-core/multi-cpu hardware.
For this I would probably using Internet research most heavily; but an on-campus primer at a good university could be a good way to start off.
Security!
Far too many programmers just build something and think they can add security as an afterthought after finishing the "main" part of the program. You could always benefit from knowing more about how to secure your app, how to design software to be secure from the get-go, how to do intrusion detection, etc.
Advanced Database Development
Things like data warehousing (MDX, OLAP queries, star schemas, fact tables, etc), advanced performance tuning, advanced schema and query patterns, and the like are always useful.
Here are the three that I'm always finding myself explaining to junior developers who didn't get enough CS training. All that other stuff is generally more hype than substance, or can be fairly easily picked up. But if you don't know these three, you can do a great deal of damage:
Algorithm analysis, including Big O
Notation.
The various levels of
cohesion and coupling.
Amdahl's Law, and how it pertains to optimizations.
Internationalization issues, especially since it sounds like it would not be an advanced topic. But it is.
Accessibility
It's ignored by so many organizations but the simple fact of the matter is that there are a huge number of people with low or no vision, color blindness, or other differences that can make navigating the web a very frustrating experience. If everybody had at least a little bit of training in it, we might get some web based UIs that are a little more inclusive.
Object oriented design patterns.
I guess "advanced" is different for everyone, but I'd suggest the following as being things that most decent developers (i.e. ones that don't need to be told about NP-completeness or design patterns) could gain from:
Multithreading techniques that go
beyond "lock" and when to apply them.
In-depth training to learn and
habitualize themselves with clever
features in their toolchain (IDE/text
editor, debugger, profiler, shell.)
Some cryptography theory and hands-on experience with different common flaws in security schemes that people create.
If they program against a database, learn the internals of their database and advanced
query composition and tuning techniques.
Developers should know the basics in SQL development and how their decisions impact database performance. It is one thing to write a query it is another thing to write a query, understand the explain plan and make design decisions based off that output. I think a good course on PL/SQL development and database performance would be very beneficial.
Unfortunately communication skills seem to fall under the "advanced topics" section for most developers (present company excluded, of course).
Best way to acquire this skill: practice.
Take of the headphones, and talk to
someone instead of IM'ing or emailing
the guy at the next desk.
Pick up the phone and talk to a
client instead of lobbing an email
over the fence.
Ask questions at a conference instead of sitting behind your laptop
screen twittering.
Actively participate in a non-technical meeting at work.
Present something in public.
Most projects do not fail because of technical reasons. They fail because they could not create a team. Communication is vital to team dynamics.
It will not harm your career either.
One of the best courses I took was a technical writing course. It has served me well in my career.
Additionally: it probably does not matter WHAT the topic is - the fact that the organization is interested in it and is paying for it and the developers want to go and do go, is a better indicator of success/improvement than any one particular topic.
I also don't think it matters that much what the topic is. Dev organizations deal with so many things during a project that training and then on the job implementation/trial and error will always get you some better perspective - even if the attempts to try out/use the new stuff fail. That experience will probably help more on the subsequent projects.
I'm a book person, so I wouldn't really bother with instruction.
Not necessarily in this order, and depending on what you know already
OO Programming
Functional Programming
Data structures and algorithms
Parallel processing
Set based logic (essentially the theory behind sql and how to apply it)
Building parsers (I only put this, because it actually came up where I work)
Software development methodolgoies
NP Completeness. Specifically, how to detect if a problem is NP-Complete, and how to build an approximate solution to the problem.
I see this as important because you don't want a developer to try and solve an NP-complete problem by getting the optimum solution, unless the problem's search space is very small, in which case brute force is acceptable. However, as the search space increases, the time required to solve the problem increases exponentially.
I'd cover new technologies and trends. Some of the new technologies I'm researching/enhancing my skills with include:
Microsoft .NET Framework v3.0/v3.5/v4.0
Cloud Computing Frameworks (Amazon EC2, Windows Azure Services, GoGrid, etc.)
Design Patterns
I am from MS based developer world, so here is my take on this
More about new concepts in Cloud Computing (various API etc.). as the industry is betting on it for sometime.
More about LinQ for .net framework
Distributed databases
Refactoring techniques (which implies also learning to write a good set of unit/functional tests).
Knowing how to refactor is the best way to keep code clean -- it is rare when you get it right the first time (especially in new designs).
A number of refactorings, however, require a decent set of tests to check that the refactoring did not add unexpected behavior.
Parallel computing- the easiest and best way to learn it
Debugging
Debugging by David J. Agans is a good book on the topic. Debugging can be very complex when you deal with multi threaded programs, crashes, algorithms that doesn't work. etc. Everybody would be better off being good at debugging.
I'd vote for real-world battle stories. Have developers from other organizations present their successes and failures. Don't limit the presentations to technologies you're using. With a significantly complex project, this is bound to cut into 'advanced' topics you haven't even considered. Real-world successes (and failures) have a lot to teach.
Go to the Stack Overflow DevDays
and the ACCU conferences
Read
Agile Software Development, Principles, Patterns, and Practices (Robert C. Martin)
Clean Code (Robert C. Martin)
The Pragmatic Programmer (Andrew Hunt&David Thomas)
Well if you're here I would hope by now you have the basics down:
OOP Best practices
Design patterns
Application Security
Database Security/Queries/Schemas
Most notably developers should strive to learn multiple programming languages and disciplines, in order for their skill set to be expanded in more than one direction. They don't need to become experts in these other skills but at least have a very acute understanding of integration with their central discipline. This will make them much better developers in the long run, and also let them gain the ability to use all tools at their disposal to create applications that can transcend the limitations of a singular language.
Outside of programming specific topics, you should also learn how to work under Agile, XP, or other team based methodologies in order to be more successful while working in a team environment.
I think an advanced programmer should know how to get your employer to give you the time & money to acquire training on as many advanced programming topics that you can eat in a year. I'm not advanced yet. :)
I'd suggest an Artificial Intelligence class at a college/university. Most of the stuff is fun, easy to grasp (the basics at least), and the solutions to problems are usually creative.
Hitchhikers Guide to the Galaxy.
How would I prefer to acquire the training? I'd love to have a substantial amount of company time dedicated to self-training.
I totally agree on Accessabiitly. I was asked to look into it for the website at work and there is a real lack of good knowledge on the subject, not only a lack of CSS standards to aid in the likes of screen readers.
However my answer goes to GUI design - its quite a difficult thing to get right. There's too many awful applications out there that could be prevented just by taking the time to follow HCI (Human Computer Interaction) advice/designs. Take Google/Apple for inspiration when making a GUI - not your typical hundreds of buttons/labels combo that too often gets pushed out.
Automated testing: Unit testing, functional integration testing, non-functional testing
Compiler details (more relevant on some platforms than others): How does the compiler implement certain common constructs in language X? On a byte-code interpreted platform, how does JIT compilation work? What can be JIT-compiled (for example, can virtual calls be JIT compiled?)?
Basic web security
Common design idioms from other problem domains than the one you're working in at the moment.
I'd recommend learning about Refactoring, Test Driven Development, and various unit testing frameworks (NUnit, Visual Test, CppUnit, etc.) I'd also learn how to incorporate automated unit testing into your continuous integration builds.
Ultimately if you can prove your code does what it claims it can do, you don't have to be there to answer questions as to why or how. If a maintainer comes along and tries to "fix" your code, they'll know instantly if they broke it. Tests written around the requirements (use cases) explain to the maintainer what your users wanted it to do, and provide a little working example of how to call it. Think of unit tests as functional documentation.
Test Driven Development (TDD) is a more novel design approach that begins with the requirements, where you start by writing a test before you write the code. You then write exactly enough code required to pass the test. You have to stop before you write extra code (that you may never need), because you will refactor it later if you find that you really needed it.
What makes TDD cool is that a bad interface (such as one with lots of dependencies) is also very hard to write tests for. It's so hard that a coder would rather refactor the interface to make it easier to test. And that refactoring simplifies the code, removing inappropriate dependencies, or grouping related tests together to make it easier to test, thus improving cohesion. By making it immediately apparent to the developer when he's writing a badly interfaced module, the developer sticks to the architecture and gravitates to the principles of tight cohesion and loose coupling. Good interfaces are the natural result. And as a bonus, once you pass all your tests, you know you're done.
On the surface this seems like an easy question to answer, just enter your favorite pet peeve about what other developers can't do correctly. But when I read through the answers and gave it some thought, I realized that every "advanced topic" brought up was covered in my undergraduate computer science curriculum--20 years ago. And I doubt that OO, security, functional programming, etc. concepts have changed in that time. Sure the tools have, but I argue that tools are different than topics.
So what is an "advanced topic" in computer science? Who is the Turing, Knuth, Yourdon of the 21st century?
I don't have a clear answer to this question, though I'd like to see more work on theories for parallel programming that will enable tools to abstract that messy stuff for developers.
Quite funny that noone hasnt mentioned:
debugging.
tools & ide you work with
and platform you are developing to.
Everyday development is much more fun if you know your tools really well and you accomplish more and make your life easier if you know how to debug someone elses code at ease.
Source Control

What is "over-engineering" as applied to software? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I wonder what would be a good definition of term "over-engineering" as applied to software development. The expression seems to be used a lot during software design discussions often in conjunction with "excessive future-proofing" and it would be nice to nail down a more precise definition.
Contrary to most answers, I do not believe that "presently unneeded functionality" is over-engineering; or it is the least problematic form.
Like you said, the worst kind of over-engineering is usually committed in the name of future-proofing and extensibility - and achieves the exact opposite:
Empty layers of abstraction that are at best unnecessary and at worst restrict you to a narrow, inefficient use of the underlying API.
Code littered with designated "extension points" such as protected methods or components acquired via abstract factories - which all turn out to be not quite what you actually need when you do have to extend the functionality.
Making everything configurable to "avoid hard-coding", with the effect that there's more (complex, failure-prone) application logic in configuration files than in source code.
Over-genericizing: instead of implementing the (technically uninteresting) functional spec, the developer builds a (technically interesting) "business rule engine" that "executes" the specs themselves as supplied by business users. The net result is an interpreter for a proprietary (scripting or domain-specific) language that is usually horribly designed, has no tool support and is so hard to use that no business user could ever work with it.
The truth is that the design that is most easily adapted to new and changing requirements (and is thus the most future-proof and extensible) is the design that is as simple as possible.
Contrary to popular belief, over-engineering is really a phenomena that appears when engineers get "hubris" and think they understand the user.
I made a simple diagram to illustrate this:
In the cases where we've considered things over engineered, it's always been describing software that has been designed to be so generic that it loses sight of the main task that it was initially designed to perform, and has therefore become not only hard to use, but fundimentally unintelligent.
To me, over-engineering is including anything that you don't need and that you don't know you're going to need. If you catch yourself saying that a feature might be nice if the requirements change in a certain way, then you might be over-engineering. Basically, over-engineering is violating YAGNI.
The agile answer to this question is: every piece of code that does not contribute to the requested functionality.
There is this discussion at Joel on Software that starts with,
creating extensive class hierarchies for an imagined future problem that does not yet exist, is a kind of over-engineering, and is therefore, bad.
And, gets into a discussion with examples.
If you spend so much time thinking about the possible ramifications of a problem that you end up interfering with the solving of the problem itself, you may be over-engineering.
There's a fine balance between "best engineering practices" and "real world applicability". At some point you have to decide that even though a particular solution may not be as "pure" from an engineering standpoint as it could be, it will do the job.
For example:
If you are designing a user management system for one-time use at a high school reunion, you probably don't need to add support for incredibly long names, or funky character sets. Setting a reasonable maximum length and doing some basic sanitizing should be sufficient. On the other hand, if you're creating a system that will be deployed for hundreds of similar events, you might want to spend some more time on the problem.
It's all about the appropriate level of effort for the task at hand.
I'm afraid that a precise definition is probably not possible as it's highly dependent on the context. For example, it's much easier to over-engineer a web site that displays glittering ponies than it is a nuclear power plant control system. Redundancies, excessive error checking, highly instrumented logging facilities are all over-engineering for a glittering ponies app, but not for a nuclear power plant control system. I think the best you can do is have a feeling about when you are applying too much overhead to your features for the purpose of the application.
Note that I would distinguish between gold-plating and over-engineering. In my mind, gold-plating is creating features that weren't asked for and will never be used. Over-engineering is more about how much "safety" you build into the application either by coding checks around the code or using excessive design for a simple task.
This relates to my controversial programming opinion of "The simplest approach is always the best approach".
Quoting from here: "...Implement things when you actually need them, never when you just foresee that you need them."
To me it is anything that would add any more fat to the code. Meat would be any code that will do the job according to the spec and fat would be any code that would bloat the code in a way that it just adds more complexity. The programmer might have been expecting a future expansion of the functionality; but still it is fat.!
My rough definition would be 'Providing functionality that isnt needed to meet the requirements spec'
I think they are the same as Gold plating and being hit by the Golden hammer :)
Basically, you can sit down and spent too much time trying to make a perfect design, without ever writing some code to check out how it works. Any agile method will tell you not to do all your design up-front, and to just create chunks of design, implement it, reiterate over it, re-design, go again, etc...
Over-engeneering means architecting and designing the applcation with more components than it really should have according to the requirements list.
There is a big difference between over-engeneering and creating an extensible applcaiton, that can be upgraded as reqirements change. If I can think of an example i'll edit the post.
Over-engineering is simply creating a product with greater functionality, quality, generality, extensibility, documentation, or any other aspect than is required.
Of course, you may have requirements outside a specific project -- for example, if you forsee doing future similar applications, then you might have additional requirements for extendability, dependent on cost, that you add on to the project specific requirements.
When your design actually makes things more complex instead of simplifying things, you’re overengineering.
More on this at:
http://www.codesimplicity.com/post/what-is-overengineering/
Disclaimer #1: I am a big-picture BA. I know no code. I read this site all the time. This is my first post.
Funny I was just told by my boss that I over-engineered a new software produce we're planning for mentoring (target market HR people). So I came here to look up the term.
They want to get something in place to sell now, re-purposing existing tools. I can't help but sit back and think, fewer signups, lower retention, if it doesn't allow some of the flexibility we talked about. And mainly, have a highly visual UI that a monkey could use.
He said we could plan future phases to improve the product, especially the UI. We have current customers waiting on "future improvements" that we still aren't doing. They need it though, truly need it.
I am in the process of resigning so I didn't push back.
But my definition would be.............making sure it only does as little as possible, for as cheap as possible, and still be passable for the thing you say it is. Beyond that is over engineering.
Disclaimer #2: This site helped me land my next job implementing a more configurable software.
I think the best answers to your question can be found in this other qestion
The beauty of Agile programming is that it's hard to over engineer if you do it right.

How do you plan an application's architecture before writing any code? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
One thing I struggle with is planning an application's architecture before writing any code.
I don't mean gathering requirements to narrow in on what the application needs to do, but rather effectively thinking about a good way to lay out the overall class, data and flow structures, and iterating those thoughts so that I have a credible plan of action in mind before even opening the IDE. At the moment it is all to easy to just open the IDE, create a blank project, start writing bits and bobs and let the design 'grow out' from there.
I gather UML is one way to do this but I have no experience with it so it seems kind of nebulous.
How do you plan an application's architecture before writing any code? If UML is the way to go, can you recommend a concise and practical introduction for a developer of smallish applications?
I appreciate your input.
I consider the following:
what the system is supposed to do, that is, what is the problem that the system is trying to solve
who is the customer and what are their wishes
what the system has to integrate with
are there any legacy aspects that need to be considered
what are the user interractions
etc...
Then I start looking at the system as a black box and:
what are the interactions that need to happen with that black box
what are the behaviours that need to happen inside the black box, i.e. what needs to happen to those interactions for the black box to exhibit the desired behaviour at a higher level, e.g. receive and process incoming messages from a reservation system, update a database etc.
Then this will start to give you a view of the system that consists of various internal black boxes, each of which can be broken down further in the same manner.
UML is very good to represent such behaviour. You can describe most systems just using two of the many components of UML, namely:
class diagrams, and
sequence diagrams.
You may need activity diagrams as well if there is any parallelism in the behaviour that needs to be described.
A good resource for learning UML is Martin Fowler's excellent book "UML Distilled" (Amazon link - sanitised for the script kiddie link nazis out there (-: ). This book gives you a quick look at the essential parts of each of the components of UML.
Oh. What I've described is pretty much Ivar Jacobson's approach. Jacobson is one of the Three Amigos of OO. In fact UML was initially developed by the other two persons that form the Three Amigos, Grady Booch and Jim Rumbaugh
I really find that a first-off of writing on paper or whiteboard is really crucial. Then move to UML if you want, but nothing beats the flexibility of just drawing it by hand at first.
You should definitely take a look at Steve McConnell's Code Complete-
and especially at his giveaway chapter on "Design in Construction"
You can download it from his website:
http://cc2e.com/File.ashx?cid=336
If you're developing for .NET, Microsoft have just published (as a free e-book!) the Application Architecture Guide 2.0b1. It provides loads of really good information about planning your architecture before writing any code.
If you were desperate I expect you could use large chunks of it for non-.NET-based architectures.
I'll preface this by saying that I do mostly web development where much of the architecture is already decided in advance (WebForms, now MVC) and most of my projects are reasonably small, one-person efforts that take less than a year. I also know going in that I'll have an ORM and DAL to handle my business object and data interaction, respectively. Recently, I've switched to using LINQ for this, so much of the "design" becomes database design and mapping via the DBML designer.
Typically, I work in a TDD (test driven development) manner. I don't spend a lot of time up front working on architectural or design details. I do gather the overall interaction of the user with the application via stories. I use the stories to work out the interaction design and discover the major components of the application. I do a lot of whiteboarding during this process with the customer -- sometimes capturing details with a digital camera if they seem important enough to keep in diagram form. Mainly my stories get captured in story form in a wiki. Eventually, the stories get organized into releases and iterations.
By this time I usually have a pretty good idea of the architecture. If it's complicated or there are unusual bits -- things that differ from my normal practices -- or I'm working with someone else (not typical), I'll diagram things (again on a whiteboard). The same is true of complicated interactions -- I may design the page layout and flow on a whiteboard, keeping it (or capturing via camera) until I'm done with that section. Once I have a general idea of where I'm going and what needs to be done first, I'll start writing tests for the first stories. Usually, this goes like: "Okay, to do that I'll need these classes. I'll start with this one and it needs to do this." Then I start merrily TDDing along and the architecture/design grows from the needs of the application.
Periodically, I'll find myself wanting to write some bits of code over again or think "this really smells" and I'll refactor my design to remove duplication or replace the smelly bits with something more elegant. Mostly, I'm concerned with getting the functionality down while following good design principles. I find that using known patterns and paying attention to good principles as you go along works out pretty well.
http://dn.codegear.com/article/31863
I use UML, and find that guide pretty useful and easy to read. Let me know if you need something different.
UML is a notation. It is a way of recording your design, but not (in my opinion) of doing a design. If you need to write things down, I would recommend UML, though, not because it's the "best" but because it is a standard which others probably already know how to read, and it beats inventing your own "standard".
I think the best introduction to UML is still UML Distilled, by Martin Fowler, because it's concise, gives pratical guidance on where to use it, and makes it clear you don't have to buy into the whole UML/RUP story for it to be useful
Doing design is hard.It can't really be captured in one StackOverflow answer. Unfortunately, my design skills, such as they are, have evolved over the years and so I don't have one source I can refer you to.
However, one model I have found useful is robustness analysis (google for it, but there's an intro here). If you have your use-cases for what the system should do, a domain model of what things are involved, then I've found robustness analysis a useful tool in connecting the two and working out what the key components of the system need to be.
But the best advice is read widely, think hard, and practice. It's not a purely teachable skill, you've got to actually do it.
I'm not smart enough to plan ahead more than a little. When I do plan ahead, my plans always come out wrong, but now I've spend n days on bad plans. My limit seems to be about 15 minutes on the whiteboard.
Basically, I do as little work as I can to find out whether I'm headed in the right direction.
I look at my design for critical questions: when A does B to C, will it be fast enough for D? If not, we need a different design. Each of these questions can be answer with a spike. If the spikes look good, then we have the design and it's time to expand on it.
I code in the direction of getting some real customer value as soon as possible, so a customer can tell me where I should be going.
Because I always get things wrong, I rely on refactoring to help me get them right. Refactoring is risky, so I have to write unit tests as I go. Writing unit tests after the fact is hard because of coupling, so I write my tests first. Staying disciplined about this stuff is hard, and a different brain sees things differently, so I like to have a buddy coding with me. My coding buddy has a nose, so I shower regularly.
Let's call it "Extreme Programming".
"White boards, sketches and Post-it notes are excellent design
tools. Complicated modeling tools have a tendency to be more
distracting than illuminating." From Practices of an Agile Developer
by Venkat Subramaniam and Andy Hunt.
I'm not convinced anything can be planned in advance before implementation. I've got 10 years experience, but that's only been at 4 companies (including 2 sites at one company, that were almost polar opposites), and almost all of my experience has been in terms of watching colossal cluster********s occur. I'm starting to think that stuff like refactoring is really the best way to do things, but at the same time I realize that my experience is limited, and I might just be reacting to what I've seen. What I'd really like to know is how to gain the best experience so I'm able to arrive at proper conclusions, but it seems like there's no shortcut and it just involves a lot of time seeing people doing things wrong :(. I'd really like to give a go at working at a company where people do things right (as evidenced by successful product deployments), to know whether I'm a just a contrarian bastard, or if I'm really as smart as I think I am.
I beg to differ: UML can be used for application architecture, but is more often used for technical architecture (frameworks, class or sequence diagrams, ...), because this is where those diagrams can most easily been kept in sync with the development.
Application Architecture occurs when you take some functional specifications (which describe the nature and flows of operations without making any assumptions about a future implementation), and you transform them into technical specifications.
Those specifications represent the applications you need for implementing some business and functional needs.
So if you need to process several large financial portfolios (functional specification), you may determine that you need to divide that large specification into:
a dispatcher to assign those heavy calculations to different servers
a launcher to make sure all calculation servers are up and running before starting to process those portfolios.
a GUI to be able to show what is going on.
a "common" component to develop the specific portfolio algorithms, independently of the rest of the application architecture, in order to facilitate unit testing, but also some functional and regression testing.
So basically, to think about application architecture is to decide what "group of files" you need to develop in a coherent way (you can not develop in the same group of files a launcher, a GUI, a dispatcher, ...: they would not be able to evolve at the same pace)
When an application architecture is well defined, each of its components is usually a good candidate for a configuration component, that is a group of file which can be versionned as a all into a VCS (Version Control System), meaning all its files will be labeled together every time you need to record a snapshot of that application (again, it would be hard to label all your system, each of its application can not be in a stable state at the same time)
I have been doing architecture for a while. I use BPML to first refine the business process and then use UML to capture various details! Third step generally is ERD! By the time you are done with BPML and UML your ERD will be fairly stable! No plan is perfect and no abstraction is going to be 100%. Plan on refactoring, goal is to minimize refactoring as much as possible!
I try to break my thinking down into two areas: a representation of the things I'm trying to manipulate, and what I intend to do with them.
When I'm trying to model the stuff I'm trying to manipulate, I come up with a series of discrete item definitions- an ecommerce site will have a SKU, a product, a customer, and so forth. I'll also have some non-material things that I'm working with- an order, or a category. Once I have all of the "nouns" in the system, I'll make a domain model that shows how these objects are related to each other- an order has a customer and multiple SKUs, many skus are grouped into a product, and so on.
These domain models can be represented as UML domain models, class diagrams, and SQL ERD's.
Once I have the nouns of the system figured out, I move on to the verbs- for instance, the operations that each of these items go through to commit an order. These usually map pretty well to use cases from my functional requirements- the easiest way to express these that I've found is UML sequence, activity, or collaboration diagrams or swimlane diagrams.
It's important to think of this as an iterative process; I'll do a little corner of the domain, and then work on the actions, and then go back. Ideally I'll have time to write code to try stuff out as I'm going along- you never want the design to get too far ahead of the application. This process is usually terrible if you think that you are building the complete and final architecture for everything; really, all you're trying to do is establish the basic foundations that the team will be sharing in common as they move through development. You're mostly creating a shared vocabulary for team members to use as they describe the system, not laying down the law for how it's gotta be done.
I find myself having trouble fully thinking a system out before coding it. It's just too easy to only bring a cursory glance to some components which you only later realize are much more complicated than you thought they were.
One solution is to just try really hard. Write UML everywhere. Go through every class. Think how it will interact with your other classes. This is difficult to do.
What I like doing is to make a general overview at first. I don't like UML, but I do like drawing diagrams which get the point across. Then I begin to implement it. Even while I'm just writing out the class structure with empty methods, I often see things that I missed earlier, so then I update my design. As I'm coding, I'll realize I need to do something differently, so I'll update my design. It's an iterative process. The concept of "design everything first, and then implement it all" is known as the waterfall model, and I think others have shown it's a bad way of doing software.
Try Archimate.

Ratio of real code to supporting code

I'm finding only about 30% of my code actually solves problems, the rest is taken up by logging, tests, parameter checking, exceptions, error handling and so on. Do you find that in your code, and is there an IDE/Editor that allows you to hide code that's not interesting?
OTOH are there languages which make the support code more manageable and smaller in size?
Edit - I think we're all aware of the difference between business logic and other code. I'm not saying that the logging etc is not important. The things is, when I'm coding I'm either implementing business logic, or I'm making sure things don't break. For me that's two different ways of thinking, do others develop like that, and is there an IDE that supports that way of developing?
Supporting code is just as important as the "real code". The quality of your product is determined as much by supporting code as anything else.
Consider an automobile. In terms of just getting from point A to point B, that requires nothing more than a go-cart: a frame, a seat, an engine, a few tires. But modern cars have a lot more than just the basics. Highly efficient engines using electronic engine timing. Automatic transmissions. Bucket seats. Heating and A/C. Rack and pinion steering. Power brakes. Anti-lock brakes. Quiet, comfortable cabins protected from the weather. Air bags. Crumple zones and other advanced safety features. Etc. Etc.
Details and execution are important, even in software. If you find that your "supporting code" tends to look more like kludges and hacks, then it's time to rethink your fundamental approach. But ultimately the fit and finish determines quality of the end product as much as anything else.
Edit: The questions you should ask yourself:
Is your "supporting code":
An umbrella duct taped to a pole or a metal and glass cabin frame?
A piece of pipe tied to the front of the car or an energy absorbing bumper integrated into a crumple zone?
A grappling hook on a rope tied to the frame or 4-wheel anti-lock power brakes?
A pair of goggles and a thick coat or a windshield and a heating system?
Answers to these questions will probably affect how much you care about your "supporting code".
Edit: Response to Dave Turvey's comment:
I'd encourage rereading the original question, one of the examples of "support code" listed is "error handling". Consider this for a moment. Imagine it in the context of, say, an automobile, a microwave oven, or even an operating system. Should error handling be relegated to second class citizenship because it serves a "support" function in some abstract sense? In an automobile the safety features are part of the fundamental design of the vehicle and comprise a substantial part of the value of the car. The safety features and "error handling" of a microwave oven (indeed, of the microwave oven's embedded software as well) are an important part of its value as well. A microwave oven that was improperly shielded could cook food just fine, under the right circumstances, but it would pose a hazard to the operator.
The implicit featureset of every tool (software or otherwise) includes this list:
Robustness
Usability
Performance
Everything anyone has ever built or used has had these features. Failure to understand this will translate to failure to execute well on these features which will make for a poor quality product of low value and low commercial interest. There is no such thing as "support code", there is only a misunderstanding of the nature of what it means for a feature to be complete. A "feature" that works in the abstract only under laboratory conditions is an experiment, not a part of a product.
The idea of pure, pristine features floating on a bog of dirty, ugly support code is the wrong image of software development. Instead, think of elegant, superbly-integrated machinery that is well-built, intuitive to use, and powerful.
The supporting code is important, but you want not to be distracted by it when you don't want to. There are two technologies that can help.
A language with first-class functions will help you modularize your code so that logging, timing, and so on can be implemented once and then combined with many other modules. It will also help you write unit tests. Some good ways to learn the techniques are to read the paper Why Functional Programming Matters and to learn about the QuickCheck tool. (No, I am not a shill for John Hughes, but he does do wonderful work.)
If you cannot use a programming language with powerful capabilities for modularization and reuse, or if you don't want to, Don Knuth's Literate Programming technique will help you organize your code so that you can split up parts the way you want and pay attention only to what you want, when you want. The Noweb literate-programming tool supports any language that can be written in ASCII, and also combinations of those languages.
If my IDE could hide "not interesting code" I would definitely turn the feature off. You wouldn't want that happening, I bet :)
There are certainly languages that minimise the amount of supporting code, but I don't think you could switch from Java to lets say JavaScript simply because in JavaScript you wouldn't have to declare every exception... I think it's quite necessary to have your supporting code where it is.
Oh, and you could have your program formally specified and mathematically proven, then you wouldn't need to support your code too much ;D
The real code you are referring to is usually called "Business Logic".
In a good unit testing system, your unit tests should be in their own classes (and probably their own assemblies) so that shouldn't be an issue.
The rest is language based for the most part. The more advanced a language, the better it's ability to avoid writing support code to some degree. Also, a well-targetted development system can help you avoid writing a lot of code (Visual Basic's screen builder, Ruby on Rails, ...) but these abstractions can break down and cause you to write just as much code as anything else if you use it to develop targets outside it's intended types of apps. (VB & Ruby don't help all that much if you're calculating prime numbers)
Beyond the language/platform, you have refactoring--the art of eliminating all the supporting code that you can (as well as redundancies in your business logic) to keep your code-base clean and small.
When practicing advanced refactoring, you'll probably end up writing tools for yourself.
Sometimes abstracting data out of your code and into a structured file of some sort can eliminate huge piles of support code and move the rest into "Business logic" because now parsing that data and setting it up is part of the "business" your program does.
This is a good trade-off because this type of business logic tends to be more readable and easier to factor. The other advantage of this kind of abstraction is that all your "Configuration" is now done in data which tends to make it somebody elses' problem.
As an example of this type of tooling: Rails itself! It takes a lot of the boilerplate of web development and factors it out of the code and into libraries driven by data and simplistic code (Ruby blurs the line between code and data--their data files actually loop back to being specified in Ruby code!)
It's like you want to take a trip to the top of Pike's Peak. You can take the Winnebago, you can take your SUV, or a motorcycle, or ride up on your bike.
Some ways are a more or less expensive, faster, etc. Sometimes you end up taking along a lot of stuff the isn't there strictly for accomplishing vertical progress; it's nice to have a beer in the cooler. But it pays to remember that you're responsible for everything that goes with you to the top.
Aspect Oriented Programming partly addresses this. It allows you to inject code into existing source/bytecode. This way you can make a task such as logging appear in its own module instead of woven into the business logic.
Work expands to fill it's container. This really sounds like an economics question. (ie. optimizing your outputs- features for users and features for the developer) with expensive inputs (time spent writing features, time spent writing plumbing code.)
You have to include user visible features or you don't have a viable product or job. Once that is done partly done, your remaining budget of time will be split between activities with a visible return on effort and an invisible (but positive!) return on effort, like exception logging, memory management, etc.
What ever language makes it cheaper to implement features will probably increase your features/to plumbing code ratio. Likewise, whatever language makes it cheaper to implement plumbing code will probably increase the feature to plumbing code because you'll have freed up more time to write features.
Like all optimization problems you'd have 2 effects-- the increase in the size of the support code (because say, you're using cheap code generation) and the increase in the size of feature related code (because you have more time left over to write features), so the final ratio might be hard to predict.
I do not begrudge the 90% of my code that is data access plumbing, because it is all testable, code generated and very cheap, compared to the 10% of handwritten of domain specific code.
I don't try to make all routines foolproof, only those exposed to the outside world.
http://en.wikipedia.org/wiki/Folding_editor
Higher and more dynamic languages are usually less verbose. Weak typing also saves a lot of code. Of course there are trade-offs.
I use the #region directive in Visual Studio to collapse blocks of code that are not the primary focus, e.g. unit tests. With log4net logging statements are only ever one line. I haven't found any approaches to reduce the parameter checking code although it sounds like C# 4 has some kind of contract framework that will help there.
I have some coworkers who once, while being chewed out by a client for an overdue and bug-ridden project, bragged to the customer that they had written 5 times as much test code as operational code.
The client was not happy, and by "not happy" I mean their skin turned green, they grew to 5 times their normal size, and their clothes popped off.
You could just make a static class in a utilities assembly that checks your parameters and things. For instance in the Spring Framework (which is where I got the idea) it has an Assert class and it makes it really fast to make sure that string params aren't empty or that object params aren't null. It cleans up validation code and reduces duplicate code which is a win win.