Related
I have a big mess of code. Admittedly, I wrote it myself - a year ago. It's not well commented but it's not very complicated either, so I can understand it -- just not well enough to know where to start as far as refactoring it.
I violated every rule that I have read about over the past year. There are classes with multiple responsibilities, there are indirect accesses (I forget the term - something like foo.bar.doSomething()), and like I said it is not well commented. On top of that, it's the beginnings of a game, so the graphics is coupled with the data, or the places where I tried to decouple graphics and data, I made the data public in order for the graphics to be able to access the data it needs...
It's a huge mess! Where do I start? How would you start on something like this?
My current approach is to take variables and switch them to private and then refactor the pieces that break, but that doesn't seem to be enough. Please suggest other strategies for wading through this mess and turning it into something clean so that I can continue where I left off!
Update two days later: I have been drawing out UML-like diagrams of my classes, and catching some of the "Low Hanging Fruit" along the way. I've even found some bits of code that were the beginnings of new features, but as I'm trying to slim everything down, I've been able to delete those bits and make the project feel cleaner. I'm probably going to refactor as much as possible before rigging my test cases (but only the things that are 100% certain not to impact the functionality, of course!), so that I won't have to refactor test cases as I change functionality. (do you think I'm doing it right or would it, in your opinion, be easier for me to suck it up and write the tests first?)
Please vote for the best answer so that I can mark it fairly! Feel free to add your own answer to the bunch as well, there's still room for you! I'll give it another day or so and then probably mark the highest-voted answer as accepted.
Thanks to everyone who has responded so far!
June 25, 2010: I discovered a blog post which directly answers this question from someone who seems to have a pretty good grasp of programming: (or maybe not, if you read his article :) )
To that end, I do four things when I
need to refactor code:
Determine what the purpose of the code was
Draw UML and action diagrams of the classes involved
Shop around for the right design patterns
Determine clearer names for the current classes and methods
Pick yourself up a copy of Martin Fowler's Refactoring. It has some good advice on ways to break down your refactoring problem. About 75% of the book is little cookbook-style refactoring steps you can do. It also advocates automated unit tests that you can run after each step to prove your code still works.
As for a place to start, I would sit down and draw out a high-level architecture of your program. You don't have to get fancy with detailed UML models, but some basic UML is not a bad idea. You need a big picture idea of how the major pieces fit together so you can visually see where your decoupling is going to happen. Just a page or two of some basic block diagrams will help with the overwhelming feeling you have right now.
Without some sort of high level spec or design, you just risk getting lost again and ending up with another unmaintainable mess.
If you need to start from scratch, remember that you never truly start from scratch. You have some code and the knowledge you gained from your first time. But sometimes it does help to start with a blank project and pull things in as you go, rather than put out fires in a messy code base. Just remember not to completely throw out the old, use it for its good parts and pull them in as you go.
What was most important for me on different occasions were unit tests: I took a few days to write tests for the old code and then I was free to refactor with confidence. How exactly is a different question, but having the tests made it possible for me to make real, substantial changes to the code.
I'll second everyone's recommendations for Fowler's Refactoring, but in your specific case you may want to look at Michael Feathers' Working Effectively with Legacy Code, which is really perfect for your situation.
Feathers talks about Characterization Tests, which are unit tests not to assert known behaviour of the system but to explore and define the existing (unclear) behaviour -- in the case where you've written your own legacy code, and fixing it yourself, this may not be so important, but if your design is sloppy then it's quite possible there are parts of the code that work by 'magic' and their behaviour isn't clear, even to you -- in that case, characterization tests will help.
One great part of the book is the discussion about finding (or creating) seams in your codebase -- seams are natural 'fault lines', if you like, where you can break into the existing system to start testing it, and pulling it towards a better design. Hard to explain but well worth a read.
There's a brief paper where Feathers fleshes out some of the concepts from the book, but it really is well worth hunting down the whole thing. It's one of my favourites.
Just an additional refactoring that is more important than you think: Name things correctly!
This goes for any variable name and method name. If the name does not accurately reflect what the thing is used for, then rename it to something more accurate. This might require several iterations. If you cannot find a short, and entirely accurate name, then that item does too much and you have an excellent candidate for a code snippet that needs to be split. The names also clearly indicate where the cuts are to be made.
Also, document your stuff. Whenever the answer to WHY? is not clearly conveyed by the answer to HOW? (being the code) you will need to add some documentation. Capturing design decisions is probably the most important task as it is very hard to do in code.
You could always start from "scratch". That doesn't mean scrap it and start from nothing, but try to rethink high-level things from the beginning, since you seem to have learned a lot since the last time you worked on it.
Start from a higher level, and as you build the scaffolding of your new and improved structure, take all the code you can reuse, which will probably be more than you think if you're willing to read through it and make some small changes.
When you're making the changes, be sure to be strict with yourself about following all the good practices you now know, because you will really thank yourself later.
It can be surprisingly refreshing to properly re-make program to do exactly what it did before, only more "cleanly". ;)
As others have mentioned as well, unit-tests are your best friend! They help you ensure that your refactoring works, and if you're starting from "scratch", it's the perfect time to write them.
You're in a much better position than many people facing this problem in that you understand what the code is supposed to do.
Taking variables out of a shared scope, as you're doing, is a great start, in that you're partitioning responsibilities. Ultimately you want each class to express a single responsibility. A few other things you might look at:
Easy targets for refactoring are code that's duplicated in lots of places and long methods.
If you're managing application state through statically initialized singletons or worse, a global state that everything is talking to, consider moving it to a managed initialization system (i.e. a dependency injection framework like spring or guice) or at least make sure that the initialization isn't entangled with the rest of the code.
Centralize and standardize how you're accessing outside resources, especially if you've got things like file locations or urls hardcoded.
Buy an IDE that has good refactoring support. I think IntelliJ is the best, but Eclipse has it now, too.
The unit test idea is key as well. You will want to have a suite of large, overall transactions that will give you the overall behavior of the code.
Once you have those, start creating unit tests for classes and smaller packages. Write the tests to demonstrate proper behavior, make your changes, and re-run the tests to demonstrate that you haven't broken everything.
Track code coverage as you go. You'll want to work it up to 70% or better. For the classes you change, you'll want those to be 70% or better before you make your changes.
Build up that safety net over time and you'll be able to refactor with some confidence.
very slowly :D
No seriously... take it one step at a time. For instance, refactor something only if it affects or helps you write the current bug/feature that you are working on right now and do no more than that. And before you refactor make darn sure that you have some kind of automated test in place that gets run on each build that will actually test what you are writing/refactoring. Even if you don't have unit tests, it is never too late to start adding them for all new and modified code that is being written. Over time, your code base will get better in small increments daily or weekly instead of worse - all without you making monumental heaps of changes.
In my personal opinion and experience, it's not worth it to just refactor a (legacy) codebase en masse for the sake of refactoring. In those cases, it's best to just start from scratch and do it right all over again (and very rarely are you afforded the opportunity to do such a thing). Hence, just refactoring incremental is the way to go.
For Java code, my favorite first step is to run Findbugs and then remove all the dead stores, un-used fields, unreachable catch blocks, unused private methods and likely bugs.
Next I run CPD to look for evidence of cut-copy-paste code.
It isn't unusual to be able to reduce the code base by 5% by doing this. It also saves you from refactoring code that is never used.
I think you should use Eclipse as a IDE because it is having many plugins and free of cost.You should now follow the MVC pattern and yes must write test cases using JUnit.Eclipse also have plugin for JUnit and it is providing code refactoring facility too so that will reduce your some work.And always remember that writing a code is not important the main thing is to write clean code.So now give comments everywhere so that not only you but any other person read the code then while reading the code he must feel that he is reading an essay.
Refactor the low-hanging fruit. Nibble away at the easy bits, and as you do that, the harder bits will begin to be a little easier. When there aren't any bits left to refactor, you're done.
The refactorings you'll probably find most useful are Rename Method (and even more trivial Renamings like Field, Variable, and Parameter), Extract Method, and Extract Class. For each refactoring you perform, write the necessary unit tests to make the refactoring safe, and run the entire suite of unit tests after each refactoring. It's tempting - and, let's be honest, pretty safe - to rely on the automated refactorings of your IDE, without the tests - but it's good practice and will be good to have the tests into the future as you add functionality to your project.
You might want to look at Martin Fowler's book Refactoring. This is the book that popularized the term and technique (my thought when taking his course: "I've been doing a lot of this all along, I didn't know it had a name"). A quote from the link:
Refactoring is a controlled technique
for improving the design of an
existing code base. Its essence is
applying a series of small
behavior-preserving transformations,
each of which "too small to be worth
doing". However the cumulative effect
of each of these transformations is
quite significant. By doing them in
small steps you reduce the risk of
introducing errors. You also avoid
having the system broken while you are
carrying out the restructuring - which
allows you to gradually refactor a
system over an extended period of
time.
As others have pointed out, unit tests will allow you to refactor with confidence. And start by reducing code duplication. The book will give you lots of other insights.
Here is a catalog of refactorings.
The correct definition of messy code, is code that hard to maintain and change.
To use more mathematical definition, you can check your code by code metrics tools.
This way, you will keep the code that already good enough, and find very fast, the wrong code.
My experience say, that is very powerful way to improve the quality of your code. (if your tool can show you the result on each build or on realtime)
Throw it away, build it new.
A few years ago, we needed a C++ IPC library for making function calls over TCP. We chose one and used it in our application. After a while, it became clear it didn't provide all functionality we needed. In the next version of our software, we threw the third party IPC library out and replaced it by one we wrote ourselves. From then on, I sometimes doubt whether this was a good decision, because it has proven to be quite a lot of work and it obviously felt like reinventing the wheel. So my question is: are there disadvantages to code reuse that justify this reinvention?
I can suggest a few
The bugs get replicated - If you reuse a buggy code :)
Sometimes it may add an additional overhead. As an example if you just need to do a simple thing it is not advisable to use a complex BIG library that implements the required feature.
You might face with some licensing concerns.
You may need to spend some time to learn\configure the external library. This may not be effective if the re-development takes a much lower time.
Reusing a poorly documented library may get more time than expected/estimated
P.S. The reasons for writing our own library were:
Evaluating external libraries is often very difficult and it takes a lot of time. Also, some problems only become visible after a thorough evaluation.
It made it possible to introduce some features that are specific for our project.
It is easier to do maintenance and to write extensions, as you know the library through and through.
It's pretty much always case by case. You have to look at the suitability and quality of what you're trying to reuse.
The number one issue is: you can only successfully reuse code if that code is GOOD code. If it was designed poorly, has bugs, or is very fragile then you'll run into the same issues you already did run into -- you have to go do it yourself anyway because it's so hard to modify the existing code.
However, if it's a third-party library that you are considering using that you don't have the source code for, it's a little different. You can try and get the source if it's that kind of library. Some commercial library vendors are open to suggestions and feature requests.
The Golden Wisdom :: It Has To Be Usable Before It Can Be Reusable.
The biggest disadvantage (you mention it yourself) by reusing third party libraries, is that you are strongly coupled and dependent to how that library works and how it's supposed to be used, unless you manage to create a middle interface layer that can take care of it.
But it's hard to create a generic interface, since replacing an existing library with another one, more or less requires that the new functionality works in similar ways. However, you can always rewrite the code using it, but that might be very hard and take a long time.
Another aspect is that if you reinvent the wheel, you have complete control over what's happening and you can do modifications as you see fit. This can be completely impossible if you are depending on a third part library being alive and constantly providing you with updates and bug fixes. On the other hand, reusing code this way enables you to focus on other things in your software, which sometimes might be the thing to do.
There's always a trade off.
If your code relies on external resources and those go away, you may be crippling portions of many applications.
Since most reused code comes from the internet, you run into all the issues with the Bathroom Wall of Code Atwood talks about. You can run into issues with insecure or unreliable borrowed code, and the more black boxed it is, the worse.
Disadvantages of code reuse:
Debugging takes a whole lot longer since it's not your code and it's likely that it's somewhat bloated code.
Any specific requirements will also take more work since you are constrained by the code you're re-using and have to work around it's limitations.
Constant code reuse will result in the long run in a bloated and disorganized applications with hard to chase bugs - programming hell.
Re-using code can (dependently on the case) reduce the challenge and satisfaction factor for the programmer, and also waste an opportunity to develop new skills.
It depends on the case, the language and the code you want to re-use or re-write. In general I believe that the higher-level the language is, the more I tend towards code reuse. Bugs in higher-level language can have a bigger impact, and they're easier to rewrite. High level code must stay readable, neat and flexible. Of course that could be said of all code, but, somehow, rewriting a C library sounds less of a good idea than rewriting (or rather re-factoring) PHP model code.
So anyway, these are some of the arguments I'd use to promote "reinventing the wheel".
Sometimes it's just faster, more fun, and better in the long run to rewrite from scratch than having to work around bugs and limitation of a current codebase.
Wondering what you are using to keep this library you reinvented?
Initial time for create a reusable code is more expensive and time cost
When master branch has an update you need to sync it and deploy again
The bugs get replicated - If you reuse a buggy code
Reusing a poorly documented code may get more time than expected/estimated
I have heard many developers refer to code as "legacy". Most of the time it is code that has been written by someone who no longer works on the project. What is it that makes code, legacy code?
Update in response to:
"Something handed down from an ancestor or a predecessor or from the past" http://www.thefreedictionary.com/legacy. Clearly you wanted to know something else. Could you clarify or expand your question? S.Lott
I am looking for the symptoms of legacy code that make it unusable or a nightmare to work with. When is it better to throw it away? It is my opinion that code should be thrown away more often and that reinventing the wheel is valuable part of development. The academic ideal of not reinventing the wheel is a nice one but it is not very practical.
On the other hand there is obviously legacy code worth keeping.
By using hardware, software, APIs, languages, technologies or features that are either no longer supported or have been superceded, typically combined with little to no possibility of ever replacing that code, instead using it til it or the system dies.
What is it that makes code, legacy code?
As with plain legacy, when the author is dead or missing, you as a heir get all or some of his code.
You shed some tears and try to figure out what to do with all this rubbish.
Michael Feathers has an interesting definition in his book Working Effectively with Legacy Code. According to him legacy code is code without automated tests.
It is a very general (and oft abused term) but any of the following would be legitimate reasons to call an app legacy:
The code base is based on a language/platform which is entirely unsupported by the manufacturer of the original product (often said manufacturer has gone out of business).
(really 1a) The code base or platform on which it is built is so old that getting qualified or experienced developers for the system is both hard and expensive.
The application supports some aspect of the business which is no longer actively grown and for which alterations are extremely rare, normally to fix it if something entirely unexpected changes around it (the canonical example being the Y2K issue) or if some regulation/external pressure forces it. Since both reasons are pressing and normally unavoidable but no significant development has occurred on the project it is likely that those people assigned to deal with this will be unfamiliar with the system (and it's accumulated behaviours and intricacies). In these cases this would often be reason to increase the perceived and planned for risk associated with the project.
The system has/or is being replaced with another. As such the system may be used for much less than originally intended, or perhaps only as a means of viewing historical data.
Legacy generally refers to code that is no longer being developed - meaning that if you use it, you have to use it on its original terms - you cannot just edit it to support the way the world looks today. For example, legacy code has to run on hardware that may not exist today - or is no longer supported.
According to Michael Feathers, the author of the excellent Working Effectively with Legacy Code, legacy code is a code which has no tests. When there is no way to know what breaks when this code changes.
The main thing that distinguishes
legacy code from non-legacy code is
tests, or rather a lack of tests. We
can get a sense of this with a little
thought experiment: how easy would it
be to modify your code base if it
could bite back, if it could tell you
when you made a mistake? It would be
pretty easy, wouldn't it? Most of the
fear involved in making changes to
large code bases is fear of
introducing subtle bugs; fear of
changing things inadvertently. With
tests, you can make things better with
impunity. To me, the difference is so
critical, it overwhelms any other
distinction. With tests, you can make
things better. Without them, you just
don’t know whether things are getting
better or worse.
Nobody is gonna read this, but I feel the other answers don't get it quite right:
It has value, if it wasn't useful it would've been thrown away long ago
Its hard to reason about because either of
Lack of documentation,
Original author cannot be found or forgot (yes 2 months later your code can be legacy code too!!),
Lack of tests or typesystem
Doesn't follow modern practices (ie no context to hold on too)
There is a requirement to change or extend it.
If there isn't a requirement to change it, it isn't legacy code
since nobody cares about it. It does its thing and there is nobody
around to call it legacy code.
A colleague once told me that legacy code was any code that you hadn't written yourself.
Arguably, it's just a pejorative term for code that we don't like any more for whatever reason (typically because it's not cool or fashionable but it works).
The TDD brigade might suggest that any code without tests is legacy code.
Legacy code is source code that relates to a no-longer supported or manufactured operating system or other computer technology.
http://en.wikipedia.org/wiki/Legacy_code
"Legacy code is source code that relates to a no-longer supported or manufactured "
Any code with support (or documentation) missing. Be it:
inline comments
technical documentation
spoken documentation (the person who wrote it)
unit tests documenting the workings of the code
For me legacy code is code that was written prior to some paradigm shift.
It may still be very much in use but it is in the process of being refactored to bring it into line.
e.g. Old procedural code hanging around in an otherwise OO system.
Code (or anything else, really) becomes "legacy" when it has been replaced by something newer/better, and yet despite this it's still used and kept alive "in the wild".
Preserving legacy code is not so much an academic ideal as it is keeping code that works, no matter how poorly. In many conservative enterprise situations, that would be considered more practical than throwing it away and starting again from scratch. Better the devil you know...
Legacy code is code that is painful/expensive to keep current with changing requirements.
There are two ways that this can happen:
The code is unsuitable for change
The semantics of the code have been swapped out to silicon
1) is the easier of the two to recognize. It is software that has fundamental limits making it unable to keep up with the ecosystem around it. For example, a system built around O(n^2) algorithm won't scale beyond a certain point and must be re-written if requirements move in that direction. Another example is code using libraries that are not supported on the latest OS versions.
2) Is harder to recognize, but all code of this kind shares the characteristic that people are afraid to change it. This could be because it was badly written/documented to begin with, because it is untested, or because it is non-trivial and the original authors who understood it left the team.
The ASCII/Unicode chars that comprise living code have semantic meaning, the "why's", "what's" and to some degree the "how's", in the minds of people associated with it. Legacy code is either un-owned or the owners do not have meaning associated with large portions of it. Once this happens (and it could happen the next day with really poorly-written code), to change this code, someone must learn it and understand it. This process is a significant fraction of the time it takes to write it in the first place.
The day you're afraid to refactor your code is the day when your code has become legacy.
I consider code "legacy" if any or all of the following conditions apply:
It was written using a language or methodology that is a generation behind current standards
The code is a complete mess with no planning or design behind it
It is written in outdated languages and in an outdated, non object-oriented style
It is difficult to find developers who know the language because it is so old
Unlike some of the other opinions here, I've seen plenty of modern applications that work decently without unit tests. Unit testing still has not caught on with everyone. Perhaps ten years from now the next generation of programmers will look at our current applications and consider them "legacy" for not containing unit tests, just as I consider non object-oriented applications to be legacy.
If few changes need to be made to a legacy codebase, it's better to simply leave it as-is and go with the flow. If the application needs drastic functionality changes, a GUI overhaul, and/or you can't find anyone who knows the programming language, it's time to throw away and start over. A word of warning, however: rewriting from scratch can be very time-consuming, and it's difficult to know if you've replicated all functionality. You'll probably want to have test cases and unit tests written for the legacy application and the new application.
Quite honestly legacy code is any code, framework, api, of other software construct thta's not "cool" anymore. For example COBOL is unanimously regarded as legacy while APL is not. Now one can also make the case that COBOL is consideed legacy and APL not because it has about 1m times the install base as APL. However, if you say that you need to work on APL code the reply would not be "oh no, that legacy stuff" but rather "oh my god, guess you won't be doing anything for the next century" see the difference?
This is a general term thrown around quite often (and quite generically) in the software ecosystem.
Well, I like to think of legacy code as inherited code. This is simply code that was written in the past. In most cases, legacy code do not follow new/current practices and is often considered archaic.
Legacy code is anything written more than a month ago :-)
It's often any code that isn't written in the trendy scripting language du jour, and I'm only half joking.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
This is definitely subjective, but I'd like to try to avoid it becoming argumentative. I think it could be an interesting question if people treat it appropriately.
The idea for this question came from the comment thread from my answer to the "What are five things you hate about your favorite language?" question. I contended that classes in C# should be sealed by default - I won't put my reasoning in the question, but I might write a fuller explanation as an answer to this question. I was surprised at the heat of the discussion in the comments (25 comments currently).
So, what contentious opinions do you hold? I'd rather avoid the kind of thing which ends up being pretty religious with relatively little basis (e.g. brace placing) but examples might include things like "unit testing isn't actually terribly helpful" or "public fields are okay really". The important thing (to me, anyway) is that you've got reasons behind your opinions.
Please present your opinion and reasoning - I would encourage people to vote for opinions which are well-argued and interesting, whether or not you happen to agree with them.
Programmers who don't code in their spare time for fun will never become as good as those that do.
I think even the smartest and most talented people will never become truly good programmers unless they treat it as more than a job. Meaning that they do little projects on the side, or just mess with lots of different languages and ideas in their spare time.
(Note: I'm not saying good programmers do nothing else than programming, but they do more than program from 9 to 5)
The only "best practice" you should be using all the time is "Use Your Brain".
Too many people jumping on too many bandwagons and trying to force methods, patterns, frameworks etc onto things that don't warrant them. Just because something is new, or because someone respected has an opinion, doesn't mean it fits all :)
EDIT:
Just to clarify - I don't think people should ignore best practices, valued opinions etc. Just that people shouldn't just blindly jump on something without thinking about WHY this "thing" is so great, IS it applicable to what I'm doing, and WHAT benefits/drawbacks does it bring?
"Googling it" is okay!
Yes, I know it offends some people out there that their years of intense memorization and/or glorious stacks of programming books are starting to fall by the wayside to a resource that anyone can access within seconds, but you shouldn't hold that against people that use it.
Too often I hear googling answers to problems the result of criticism, and it really is without sense. First of all, it must be conceded that everyone needs materials to reference. You don't know everything and you will need to look things up. Conceding that, does it really matter where you got the information? Does it matter if you looked it up in a book, looked it up on Google, or heard it from a talking frog that you hallucinated? No. A right answer is a right answer.
What is important is that you understand the material, use it as the means to an end of a successful programming solution, and the client/your employer is happy with the results.
(although if you are getting answers from hallucinatory talking frogs, you should probably get some help all the same)
Most comments in code are in fact a pernicious form of code duplication.
We spend most of our time maintaining code written by others (or ourselves) and poor, incorrect, outdated, misleading comments must be near the top of the list of most annoying artifacts in code.
I think eventually many people just blank them out, especially those flowerbox monstrosities.
Much better to concentrate on making the code readable, refactoring as necessary, and minimising idioms and quirkiness.
On the other hand, many courses teach that comments are very nearly more important than the code itself, leading to the this next line adds one to invoiceTotal style of commenting.
XML is highly overrated
I think too many jump onto the XML bandwagon before using their brains...
XML for web stuff is great, as it's designed for it. Otherwise I think some problem definition and design thoughts should preempt any decision to use it.
My 5 cents
Not all programmers are created equal
Quite often managers think that DeveloperA == DeveloperB simply because they have same level of experience and so on. In actual fact, the performance of one developer can be 10x or even 100x that of another.
It's politically risky to talk about it, but sometimes I feel like pointing out that, even though several team members may appear to be of equal skill, it's not always the case. I have even seen cases where lead developers were 'beyond hope' and junior devs did all the actual work - I made sure they got the credit, though. :)
I fail to understand why people think that Java is absolutely the best "first" programming language to be taught in universities.
For one, I believe that first programming language should be such that it highlights the need to learn control flow and variables, not objects and syntax
For another, I believe that people who have not had experience in debugging memory leaks in C / C++ cannot fully appreciate what Java brings to the table.
Also the natural progression should be from "how can I do this" to "how can I find the library which does that" and not the other way round.
If you only know one language, no matter how well you know it, you're not a great programmer.
There seems to be an attitude that says once you're really good at C# or Java or whatever other language you started out learning then that's all you need. I don't believe it- every language I have ever learned has taught me something new about programming that I have been able to bring back into my work with all the others. I think that anyone who restricts themselves to one language will never be as good as they could be.
It also indicates to me a certain lack of inquistiveness and willingness to experiment that doesn't necessarily tally with the qualities I would expect to find in a really good programmer.
Performance does matter.
Print statements are a valid way to debug code
I believe it is perfectly fine to debug your code by littering it with System.out.println (or whatever print statement works for your language). Often, this can be quicker than debugging, and you can compare printed outputs against other runs of the app.
Just make sure to remove the print statements when you go to production (or better, turn them into logging statements)
Your job is to put yourself out of work.
When you're writing software for your employer, any software that you create is to be written in such a way that it can be picked up by any developer and understood with a minimal amount of effort. It is well designed, clearly and consistently written, formatted cleanly, documented where it needs to be, builds daily as expected, checked into the repository, and appropriately versioned.
If you get hit by a bus, laid off, fired, or walk off the job, your employer should be able to replace you on a moment's notice, and the next guy could step into your role, pick up your code and be up and running within a week tops. If he or she can't do that, then you've failed miserably.
Interestingly, I've found that having that goal has made me more valuable to my employers. The more I strive to be disposable, the more valuable I become to them.
1) The Business Apps farce:
I think that the whole "Enterprise" frameworks thing is smoke and mirrors. J2EE, .NET, the majority of the Apache frameworks and most abstractions to manage such things create far more complexity than they solve.
Take any regular Java or .NET ORM, or any supposedly modern MVC framework for either which does "magic" to solve tedious, simple tasks. You end up writing huge amounts of ugly XML boilerplate that is difficult to validate and write quickly. You have massive APIs where half of those are just to integrate the work of the other APIs, interfaces that are impossible to recycle, and abstract classes that are needed only to overcome the inflexibility of Java and C#. We simply don't need most of that.
How about all the different application servers with their own darned descriptor syntax, the overly complex database and groupware products?
The point of this is not that complexity==bad, it's that unnecessary complexity==bad. I've worked in massive enterprise installations where some of it was necessary, but even in most cases a few home-grown scripts and a simple web frontend is all that's needed to solve most use cases.
I'd try to replace all of these enterprisey apps with simple web frameworks, open source DBs, and trivial programming constructs.
2) The n-years-of-experience-required:
Unless you need a consultant or a technician to handle a specific issue related to an application, API or framework, then you don't really need someone with 5 years of experience in that application. What you need is a developer/admin who can read documentation, who has domain knowledge in whatever it is you're doing, and who can learn quickly. If you need to develop in some kind of language, a decent developer will pick it up in less than 2 months. If you need an administrator for X web server, in two days he should have read the man pages and newsgroups and be up to speed. Anything less and that person is not worth what he is paid.
3) The common "computer science" degree curriculum:
The majority of computer science and software engineering degrees are bull. If your first programming language is Java or C#, then you're doing something wrong. If you don't get several courses full of algebra and math, it's wrong. If you don't delve into functional programming, it's incomplete. If you can't apply loop invariants to a trivial for loop, you're not worth your salt as a supposed computer scientist. If you come out with experience in x and y languages and object orientation, it's full of s***. A real computer scientist sees a language in terms of the concepts and syntaxes it uses, and sees programming methodologies as one among many, and has such a good understanding of the underlying philosophies of both that picking new languages, design methods, or specification languages should be trivial.
Getters and Setters are Highly Overused
I've seen millions of people claiming that public fields are evil, so they make them private and provide getters and setters for all of them. I believe this is almost identical to making the fields public, maybe a bit different if you're using threads (but generally is not the case) or if your accessors have business/presentation logic (something 'strange' at least).
I'm not in favor of public fields, but against making a getter/setter (or Property) for everyone of them, and then claiming that doing that is encapsulation or information hiding... ha!
UPDATE:
This answer has raised some controversy in it's comments, so I'll try to clarify it a bit (I'll leave the original untouched since that is what many people upvoted).
First of all: anyone who uses public fields deserves jail time
Now, creating private fields and then using the IDE to automatically generate getters and setters for every one of them is nearly as bad as using public fields.
Many people think:
private fields + public accessors == encapsulation
I say (automatic or not) generation of getter/setter pair for your fields effectively goes against the so called encapsulation you are trying to achieve.
Lastly, let me quote Uncle Bob in this topic (taken from chapter 6 of "Clean Code"):
There is a reason that we keep our
variables private. We don't want
anyone else to depend on them. We want
the freedom to change their type or
implementation on a whim or an
impulse. Why, then, do so many
programmers automatically add getters
and setters to their objects, exposing
their private fields as if they were
public?
UML diagrams are highly overrated
Of course there are useful diagrams e.g. class diagram for the Composite Pattern, but many UML diagrams have absolutely no value.
Opinion: SQL is code. Treat it as such
That is, just like your C#, Java, or other favorite object/procedure language, develop a formatting style that is readable and maintainable.
I hate when I see sloppy free-formatted SQL code. If you scream when you see both styles of curly braces on a page, why or why don't you scream when you see free formatted SQL or SQL that obscures or obfuscates the JOIN condition?
Readability is the most important aspect of your code.
Even more so than correctness. If it's readable, it's easy to fix. It's also easy to optimize, easy to change, easy to understand. And hopefully other developers can learn something from it too.
If you're a developer, you should be able to write code
I did quite a bit of interviewing last year, and for my part of the interview I was supposed to test the way people thought, and how they implemented simple-to-moderate algorithms on a white board. I'd initially started out with questions like:
Given that Pi can be estimated using the function 4 * (1 - 1/3 + 1/5 - 1/7 + ...) with more terms giving greater accuracy, write a function that calculates Pi to an accuracy of 5 decimal places.
It's a problem that should make you think, but shouldn't be out of reach to a seasoned developer (it can be answered in about 10 lines of C#). However, many of our (supposedly pre-screened by the agency) candidates couldn't even begin to answer it, or even explain how they might go about answering it. So after a while I started asking simpler questions like:
Given the area of a circle is given by Pi times the radius squared, write a function to calculate the area of a circle.
Amazingly, more than half the candidates couldn't write this function in any language (I can read most popular languages so I let them use any language of their choice, including pseudo-code). We had "C# developers" who could not write this function in C#.
I was surprised by this. I had always thought that developers should be able to write code. It seems that, nowadays, this is a controversial opinion. Certainly it is amongst interview candidates!
Edit:
There's a lot of discussion in the comments about whether the first question is a good or bad one, and whether you should ask questions as complex as this in an interview. I'm not going to delve into this here (that's a whole new question) apart from to say you're largely missing the point of the post.
Yes, I said people couldn't make any headway with this, but the second question is trivial and many people couldn't make any headway with that one either! Anybody who calls themselves a developer should be able to write the answer to the second one in a few seconds without even thinking. And many can't.
The use of hungarian notation should be punished with death.
That should be controversial enough ;)
Design patterns are hurting good design more than they're helping it.
IMO software design, especially good software design is far too varied to be meaningfully captured in patterns, especially in the small number of patterns people can actually remember - and they're far too abstract for people to really remember more than a handful. So they're not helping much.
And on the other hand, far too many people become enamoured with the concept and try to apply patterns everywhere - usually, in the resulting code you can't find the actual design between all the (completely meaningless) Singletons and Abstract Factories.
Less code is better than more!
If the users say "that's it?", and your work remains invisible, it's done right. Glory can be found elsewhere.
PHP sucks ;-)
The proof is in the pudding.
Unit Testing won't help you write good code
The only reason to have Unit tests is to make sure that code that already works doesn't break. Writing tests first, or writing code to the tests is ridiculous. If you write to the tests before the code, you won't even know what the edge cases are. You could have code that passes the tests but still fails in unforeseen circumstances.
And furthermore, good developers will keep cohesion low, which will make the addition of new code unlikely to cause problems with existing stuff.
In fact, I'll generalize that even further,
Most "Best Practices" in Software Engineering are there to keep bad programmers from doing too much damage.
They're there to hand-hold bad developers and keep them from making dumbass mistakes. Of course, since most developers are bad, this is a good thing, but good developers should get a pass.
Write small methods. It seems that programmers love to write loooong methods where they do multiple different things.
I think that a method should be created wherever you can name one.
It's ok to write garbage code once in a while
Sometimes a quick and dirty piece of garbage code is all that is needed to fulfill a particular task. Patterns, ORMs, SRP, whatever... Throw up a Console or Web App, write some inline sql ( feels good ), and blast out the requirement.
Code == Design
I'm no fan of sophisticated UML diagrams and endless code documentation. In a high level language, your code should be readable and understandable as is. Complex documentation and diagrams aren't really any more user friendly.
Here's an article on the topic of Code as Design.
Software development is just a job
Don't get me wrong, I enjoy software development a lot. I've written a blog for the last few years on the subject. I've spent enough time on here to have >5000 reputation points. And I work in a start-up doing typically 60 hour weeks for much less money than I could get as a contractor because the team is fantastic and the work is interesting.
But in the grand scheme of things, it is just a job.
It ranks in importance below many things such as family, my girlfriend, friends, happiness etc., and below other things I'd rather be doing if I had an unlimited supply of cash such as riding motorbikes, sailing yachts, or snowboarding.
I think sometimes a lot of developers forget that developing is just something that allows us to have the more important things in life (and to have them by doing something we enjoy) rather than being the end goal in itself.
I also think there's nothing wrong with having binaries in source control.. if there is a good reason for it. If I have an assembly I don't have the source for, and might not necessarily be in the same place on each devs machine, then I will usually stick it in a "binaries" directory and reference it in a project using a relative path.
Quite a lot of people seem to think I should be burned at the stake for even mentioning "source control" and "binary" in the same sentence. I even know of places that have strict rules saying you can't add them.
Every developer should be familiar with the basic architecture of modern computers. This also applies to developers who target a virtual machine (maybe even more so, because they have been told time and time again that they don't need to worry themselves with memory management etc.)
Software Architects/Designers are Overrated
As a developer, I hate the idea of Software Architects. They are basically people that no longer code full time, read magazines and articles, and then tell you how to design software. Only people that actually write software full time for a living should be doing that. I don't care if you were the worlds best coder 5 years ago before you became an Architect, your opinion is useless to me.
How's that for controversial?
Edit (to clarify): I think most Software Architects make great Business Analysts (talking with customers, writing requirements, tests, etc), I simply think they have no place in designing software, high level or otherwise.
There is no "one size fits all" approach to development
I'm surprised that this is a controversial opinion, because it seems to me like common sense. However, there are many entries on popular blogs promoting the "one size fits all" approach to development so I think I may actually be in the minority.
Things I've seen being touted as the correct approach for any project - before any information is known about it - are things like the use of Test Driven Development (TDD), Domain Driven Design (DDD), Object-Relational Mapping (ORM), Agile (capital A), Object Orientation (OO), etc. etc. encompassing everything from methodologies to architectures to components. All with nice marketable acronyms, of course.
People even seem to go as far as putting badges on their blogs such as "I'm Test Driven" or similar, as if their strict adherence to a single approach whatever the details of the project project is actually a good thing.
It isn't.
Choosing the correct methodologies and architectures and components, etc., is something that should be done on a per-project basis, and depends not only on the type of project you're working on and its unique requirements, but also the size and ability of the team you're working with.
I'm finding only about 30% of my code actually solves problems, the rest is taken up by logging, tests, parameter checking, exceptions, error handling and so on. Do you find that in your code, and is there an IDE/Editor that allows you to hide code that's not interesting?
OTOH are there languages which make the support code more manageable and smaller in size?
Edit - I think we're all aware of the difference between business logic and other code. I'm not saying that the logging etc is not important. The things is, when I'm coding I'm either implementing business logic, or I'm making sure things don't break. For me that's two different ways of thinking, do others develop like that, and is there an IDE that supports that way of developing?
Supporting code is just as important as the "real code". The quality of your product is determined as much by supporting code as anything else.
Consider an automobile. In terms of just getting from point A to point B, that requires nothing more than a go-cart: a frame, a seat, an engine, a few tires. But modern cars have a lot more than just the basics. Highly efficient engines using electronic engine timing. Automatic transmissions. Bucket seats. Heating and A/C. Rack and pinion steering. Power brakes. Anti-lock brakes. Quiet, comfortable cabins protected from the weather. Air bags. Crumple zones and other advanced safety features. Etc. Etc.
Details and execution are important, even in software. If you find that your "supporting code" tends to look more like kludges and hacks, then it's time to rethink your fundamental approach. But ultimately the fit and finish determines quality of the end product as much as anything else.
Edit: The questions you should ask yourself:
Is your "supporting code":
An umbrella duct taped to a pole or a metal and glass cabin frame?
A piece of pipe tied to the front of the car or an energy absorbing bumper integrated into a crumple zone?
A grappling hook on a rope tied to the frame or 4-wheel anti-lock power brakes?
A pair of goggles and a thick coat or a windshield and a heating system?
Answers to these questions will probably affect how much you care about your "supporting code".
Edit: Response to Dave Turvey's comment:
I'd encourage rereading the original question, one of the examples of "support code" listed is "error handling". Consider this for a moment. Imagine it in the context of, say, an automobile, a microwave oven, or even an operating system. Should error handling be relegated to second class citizenship because it serves a "support" function in some abstract sense? In an automobile the safety features are part of the fundamental design of the vehicle and comprise a substantial part of the value of the car. The safety features and "error handling" of a microwave oven (indeed, of the microwave oven's embedded software as well) are an important part of its value as well. A microwave oven that was improperly shielded could cook food just fine, under the right circumstances, but it would pose a hazard to the operator.
The implicit featureset of every tool (software or otherwise) includes this list:
Robustness
Usability
Performance
Everything anyone has ever built or used has had these features. Failure to understand this will translate to failure to execute well on these features which will make for a poor quality product of low value and low commercial interest. There is no such thing as "support code", there is only a misunderstanding of the nature of what it means for a feature to be complete. A "feature" that works in the abstract only under laboratory conditions is an experiment, not a part of a product.
The idea of pure, pristine features floating on a bog of dirty, ugly support code is the wrong image of software development. Instead, think of elegant, superbly-integrated machinery that is well-built, intuitive to use, and powerful.
The supporting code is important, but you want not to be distracted by it when you don't want to. There are two technologies that can help.
A language with first-class functions will help you modularize your code so that logging, timing, and so on can be implemented once and then combined with many other modules. It will also help you write unit tests. Some good ways to learn the techniques are to read the paper Why Functional Programming Matters and to learn about the QuickCheck tool. (No, I am not a shill for John Hughes, but he does do wonderful work.)
If you cannot use a programming language with powerful capabilities for modularization and reuse, or if you don't want to, Don Knuth's Literate Programming technique will help you organize your code so that you can split up parts the way you want and pay attention only to what you want, when you want. The Noweb literate-programming tool supports any language that can be written in ASCII, and also combinations of those languages.
If my IDE could hide "not interesting code" I would definitely turn the feature off. You wouldn't want that happening, I bet :)
There are certainly languages that minimise the amount of supporting code, but I don't think you could switch from Java to lets say JavaScript simply because in JavaScript you wouldn't have to declare every exception... I think it's quite necessary to have your supporting code where it is.
Oh, and you could have your program formally specified and mathematically proven, then you wouldn't need to support your code too much ;D
The real code you are referring to is usually called "Business Logic".
In a good unit testing system, your unit tests should be in their own classes (and probably their own assemblies) so that shouldn't be an issue.
The rest is language based for the most part. The more advanced a language, the better it's ability to avoid writing support code to some degree. Also, a well-targetted development system can help you avoid writing a lot of code (Visual Basic's screen builder, Ruby on Rails, ...) but these abstractions can break down and cause you to write just as much code as anything else if you use it to develop targets outside it's intended types of apps. (VB & Ruby don't help all that much if you're calculating prime numbers)
Beyond the language/platform, you have refactoring--the art of eliminating all the supporting code that you can (as well as redundancies in your business logic) to keep your code-base clean and small.
When practicing advanced refactoring, you'll probably end up writing tools for yourself.
Sometimes abstracting data out of your code and into a structured file of some sort can eliminate huge piles of support code and move the rest into "Business logic" because now parsing that data and setting it up is part of the "business" your program does.
This is a good trade-off because this type of business logic tends to be more readable and easier to factor. The other advantage of this kind of abstraction is that all your "Configuration" is now done in data which tends to make it somebody elses' problem.
As an example of this type of tooling: Rails itself! It takes a lot of the boilerplate of web development and factors it out of the code and into libraries driven by data and simplistic code (Ruby blurs the line between code and data--their data files actually loop back to being specified in Ruby code!)
It's like you want to take a trip to the top of Pike's Peak. You can take the Winnebago, you can take your SUV, or a motorcycle, or ride up on your bike.
Some ways are a more or less expensive, faster, etc. Sometimes you end up taking along a lot of stuff the isn't there strictly for accomplishing vertical progress; it's nice to have a beer in the cooler. But it pays to remember that you're responsible for everything that goes with you to the top.
Aspect Oriented Programming partly addresses this. It allows you to inject code into existing source/bytecode. This way you can make a task such as logging appear in its own module instead of woven into the business logic.
Work expands to fill it's container. This really sounds like an economics question. (ie. optimizing your outputs- features for users and features for the developer) with expensive inputs (time spent writing features, time spent writing plumbing code.)
You have to include user visible features or you don't have a viable product or job. Once that is done partly done, your remaining budget of time will be split between activities with a visible return on effort and an invisible (but positive!) return on effort, like exception logging, memory management, etc.
What ever language makes it cheaper to implement features will probably increase your features/to plumbing code ratio. Likewise, whatever language makes it cheaper to implement plumbing code will probably increase the feature to plumbing code because you'll have freed up more time to write features.
Like all optimization problems you'd have 2 effects-- the increase in the size of the support code (because say, you're using cheap code generation) and the increase in the size of feature related code (because you have more time left over to write features), so the final ratio might be hard to predict.
I do not begrudge the 90% of my code that is data access plumbing, because it is all testable, code generated and very cheap, compared to the 10% of handwritten of domain specific code.
I don't try to make all routines foolproof, only those exposed to the outside world.
http://en.wikipedia.org/wiki/Folding_editor
Higher and more dynamic languages are usually less verbose. Weak typing also saves a lot of code. Of course there are trade-offs.
I use the #region directive in Visual Studio to collapse blocks of code that are not the primary focus, e.g. unit tests. With log4net logging statements are only ever one line. I haven't found any approaches to reduce the parameter checking code although it sounds like C# 4 has some kind of contract framework that will help there.
I have some coworkers who once, while being chewed out by a client for an overdue and bug-ridden project, bragged to the customer that they had written 5 times as much test code as operational code.
The client was not happy, and by "not happy" I mean their skin turned green, they grew to 5 times their normal size, and their clothes popped off.
You could just make a static class in a utilities assembly that checks your parameters and things. For instance in the Spring Framework (which is where I got the idea) it has an Assert class and it makes it really fast to make sure that string params aren't empty or that object params aren't null. It cleans up validation code and reduces duplicate code which is a win win.