Related
I have a big mess of code. Admittedly, I wrote it myself - a year ago. It's not well commented but it's not very complicated either, so I can understand it -- just not well enough to know where to start as far as refactoring it.
I violated every rule that I have read about over the past year. There are classes with multiple responsibilities, there are indirect accesses (I forget the term - something like foo.bar.doSomething()), and like I said it is not well commented. On top of that, it's the beginnings of a game, so the graphics is coupled with the data, or the places where I tried to decouple graphics and data, I made the data public in order for the graphics to be able to access the data it needs...
It's a huge mess! Where do I start? How would you start on something like this?
My current approach is to take variables and switch them to private and then refactor the pieces that break, but that doesn't seem to be enough. Please suggest other strategies for wading through this mess and turning it into something clean so that I can continue where I left off!
Update two days later: I have been drawing out UML-like diagrams of my classes, and catching some of the "Low Hanging Fruit" along the way. I've even found some bits of code that were the beginnings of new features, but as I'm trying to slim everything down, I've been able to delete those bits and make the project feel cleaner. I'm probably going to refactor as much as possible before rigging my test cases (but only the things that are 100% certain not to impact the functionality, of course!), so that I won't have to refactor test cases as I change functionality. (do you think I'm doing it right or would it, in your opinion, be easier for me to suck it up and write the tests first?)
Please vote for the best answer so that I can mark it fairly! Feel free to add your own answer to the bunch as well, there's still room for you! I'll give it another day or so and then probably mark the highest-voted answer as accepted.
Thanks to everyone who has responded so far!
June 25, 2010: I discovered a blog post which directly answers this question from someone who seems to have a pretty good grasp of programming: (or maybe not, if you read his article :) )
To that end, I do four things when I
need to refactor code:
Determine what the purpose of the code was
Draw UML and action diagrams of the classes involved
Shop around for the right design patterns
Determine clearer names for the current classes and methods
Pick yourself up a copy of Martin Fowler's Refactoring. It has some good advice on ways to break down your refactoring problem. About 75% of the book is little cookbook-style refactoring steps you can do. It also advocates automated unit tests that you can run after each step to prove your code still works.
As for a place to start, I would sit down and draw out a high-level architecture of your program. You don't have to get fancy with detailed UML models, but some basic UML is not a bad idea. You need a big picture idea of how the major pieces fit together so you can visually see where your decoupling is going to happen. Just a page or two of some basic block diagrams will help with the overwhelming feeling you have right now.
Without some sort of high level spec or design, you just risk getting lost again and ending up with another unmaintainable mess.
If you need to start from scratch, remember that you never truly start from scratch. You have some code and the knowledge you gained from your first time. But sometimes it does help to start with a blank project and pull things in as you go, rather than put out fires in a messy code base. Just remember not to completely throw out the old, use it for its good parts and pull them in as you go.
What was most important for me on different occasions were unit tests: I took a few days to write tests for the old code and then I was free to refactor with confidence. How exactly is a different question, but having the tests made it possible for me to make real, substantial changes to the code.
I'll second everyone's recommendations for Fowler's Refactoring, but in your specific case you may want to look at Michael Feathers' Working Effectively with Legacy Code, which is really perfect for your situation.
Feathers talks about Characterization Tests, which are unit tests not to assert known behaviour of the system but to explore and define the existing (unclear) behaviour -- in the case where you've written your own legacy code, and fixing it yourself, this may not be so important, but if your design is sloppy then it's quite possible there are parts of the code that work by 'magic' and their behaviour isn't clear, even to you -- in that case, characterization tests will help.
One great part of the book is the discussion about finding (or creating) seams in your codebase -- seams are natural 'fault lines', if you like, where you can break into the existing system to start testing it, and pulling it towards a better design. Hard to explain but well worth a read.
There's a brief paper where Feathers fleshes out some of the concepts from the book, but it really is well worth hunting down the whole thing. It's one of my favourites.
Just an additional refactoring that is more important than you think: Name things correctly!
This goes for any variable name and method name. If the name does not accurately reflect what the thing is used for, then rename it to something more accurate. This might require several iterations. If you cannot find a short, and entirely accurate name, then that item does too much and you have an excellent candidate for a code snippet that needs to be split. The names also clearly indicate where the cuts are to be made.
Also, document your stuff. Whenever the answer to WHY? is not clearly conveyed by the answer to HOW? (being the code) you will need to add some documentation. Capturing design decisions is probably the most important task as it is very hard to do in code.
You could always start from "scratch". That doesn't mean scrap it and start from nothing, but try to rethink high-level things from the beginning, since you seem to have learned a lot since the last time you worked on it.
Start from a higher level, and as you build the scaffolding of your new and improved structure, take all the code you can reuse, which will probably be more than you think if you're willing to read through it and make some small changes.
When you're making the changes, be sure to be strict with yourself about following all the good practices you now know, because you will really thank yourself later.
It can be surprisingly refreshing to properly re-make program to do exactly what it did before, only more "cleanly". ;)
As others have mentioned as well, unit-tests are your best friend! They help you ensure that your refactoring works, and if you're starting from "scratch", it's the perfect time to write them.
You're in a much better position than many people facing this problem in that you understand what the code is supposed to do.
Taking variables out of a shared scope, as you're doing, is a great start, in that you're partitioning responsibilities. Ultimately you want each class to express a single responsibility. A few other things you might look at:
Easy targets for refactoring are code that's duplicated in lots of places and long methods.
If you're managing application state through statically initialized singletons or worse, a global state that everything is talking to, consider moving it to a managed initialization system (i.e. a dependency injection framework like spring or guice) or at least make sure that the initialization isn't entangled with the rest of the code.
Centralize and standardize how you're accessing outside resources, especially if you've got things like file locations or urls hardcoded.
Buy an IDE that has good refactoring support. I think IntelliJ is the best, but Eclipse has it now, too.
The unit test idea is key as well. You will want to have a suite of large, overall transactions that will give you the overall behavior of the code.
Once you have those, start creating unit tests for classes and smaller packages. Write the tests to demonstrate proper behavior, make your changes, and re-run the tests to demonstrate that you haven't broken everything.
Track code coverage as you go. You'll want to work it up to 70% or better. For the classes you change, you'll want those to be 70% or better before you make your changes.
Build up that safety net over time and you'll be able to refactor with some confidence.
very slowly :D
No seriously... take it one step at a time. For instance, refactor something only if it affects or helps you write the current bug/feature that you are working on right now and do no more than that. And before you refactor make darn sure that you have some kind of automated test in place that gets run on each build that will actually test what you are writing/refactoring. Even if you don't have unit tests, it is never too late to start adding them for all new and modified code that is being written. Over time, your code base will get better in small increments daily or weekly instead of worse - all without you making monumental heaps of changes.
In my personal opinion and experience, it's not worth it to just refactor a (legacy) codebase en masse for the sake of refactoring. In those cases, it's best to just start from scratch and do it right all over again (and very rarely are you afforded the opportunity to do such a thing). Hence, just refactoring incremental is the way to go.
For Java code, my favorite first step is to run Findbugs and then remove all the dead stores, un-used fields, unreachable catch blocks, unused private methods and likely bugs.
Next I run CPD to look for evidence of cut-copy-paste code.
It isn't unusual to be able to reduce the code base by 5% by doing this. It also saves you from refactoring code that is never used.
I think you should use Eclipse as a IDE because it is having many plugins and free of cost.You should now follow the MVC pattern and yes must write test cases using JUnit.Eclipse also have plugin for JUnit and it is providing code refactoring facility too so that will reduce your some work.And always remember that writing a code is not important the main thing is to write clean code.So now give comments everywhere so that not only you but any other person read the code then while reading the code he must feel that he is reading an essay.
Refactor the low-hanging fruit. Nibble away at the easy bits, and as you do that, the harder bits will begin to be a little easier. When there aren't any bits left to refactor, you're done.
The refactorings you'll probably find most useful are Rename Method (and even more trivial Renamings like Field, Variable, and Parameter), Extract Method, and Extract Class. For each refactoring you perform, write the necessary unit tests to make the refactoring safe, and run the entire suite of unit tests after each refactoring. It's tempting - and, let's be honest, pretty safe - to rely on the automated refactorings of your IDE, without the tests - but it's good practice and will be good to have the tests into the future as you add functionality to your project.
You might want to look at Martin Fowler's book Refactoring. This is the book that popularized the term and technique (my thought when taking his course: "I've been doing a lot of this all along, I didn't know it had a name"). A quote from the link:
Refactoring is a controlled technique
for improving the design of an
existing code base. Its essence is
applying a series of small
behavior-preserving transformations,
each of which "too small to be worth
doing". However the cumulative effect
of each of these transformations is
quite significant. By doing them in
small steps you reduce the risk of
introducing errors. You also avoid
having the system broken while you are
carrying out the restructuring - which
allows you to gradually refactor a
system over an extended period of
time.
As others have pointed out, unit tests will allow you to refactor with confidence. And start by reducing code duplication. The book will give you lots of other insights.
Here is a catalog of refactorings.
The correct definition of messy code, is code that hard to maintain and change.
To use more mathematical definition, you can check your code by code metrics tools.
This way, you will keep the code that already good enough, and find very fast, the wrong code.
My experience say, that is very powerful way to improve the quality of your code. (if your tool can show you the result on each build or on realtime)
Throw it away, build it new.
Here's what I'm wondering. Every night that our 3 months old baby lets us sleep, I jump to my computer and start coding my hobby projects. I have about 20 different projects that I'm working on: different types of projects, from C++ games to web apps along with some contribution to open source projects. It's truly a passion and has been for a lot of years.
Yet, when I look back, I see that I haven't been able to fully complete one of my hobby projects. I've always done the prototypes and setup the most important features, but with time instead of finishing my project I end up switching to another project that seems "so much cooler" at the moment. Hence I usually end up with buggy and incomplete games that have no end nor story, 3D engines that have the fastest PolygonDraw routine ever, yet lack to implement anything else, etc... The list is long. I think I must have written unfinished Pong over a hundred times different!
I've been told that the remedy is to write specs for my hobby projects.
On one hand, I write a lot of specs at work. I know how crucial they are for defining a product's roadmap and staying within schedule. On the other hand, specs and hobby project just quite don't seem to go along! It seems to me that the learning curve to building a game is actually what makes it fun; not the game itself. Hence the fun of losing time restructuring an entire engine, the fun of creating the most useless features, and so on...
So here comes the question: Do you ever write specifications for your hobby projects? How are they different then the ones from work? How do you manage to complete your hobby projects?
I'd be glad to know while I work on my new project: a piano sonata generator :)
I don't think writing specs is the solution to your problem. Clearly, your "hobby projects" are things that you find fun. You write the fun parts but then avoid the not fun parts that would be necessary to complete something.
If you're just "programming for fun" then good, you're succeeding. I don't think writing specs is fun.
If you really want to "finish" something, the best way isn't to write a spec, it's to not jump to another project when the fun factor dips.
It is all about 'Self project management' ... even for fun.
I feel for you ... I used to have many repos that tended to all get stuck at around revision 200 or so.
Here is what used to happen, because I didn't do enough planning, after around 200 commits, things get messy and need a rewrite ... then interest disappears because it seems like too much hassle.
I learned to write my own specs for personal use
to
Give me focus to get the job done, and not go off into feature creep lane
Remind me what I am working towards
To have great ideas before I get coding
Keep thing more fun for a longer time
For me, writing my own specs is vital to getting anything done!
You wouldn't start a business without a plan would you?
For personal projects I have tons of moleskine books filled with rough specs and ideas. When they mature, they migrate from the note books into real documents and the coding begins.
BIG EDIT: On a drive for personal efficiency and, to get projects finished. I read "Getting Things Done" ... Despite all the hippy crap about 'psyche' and various levels of mind (which im sure is not based in any science) the tips are very good.
I don't get too complicated, but listing out all of the features and requirements that you want included in your application really does help. As with most hobby projects you often don't just sit down and code them straight through for 2 months and finish them. It's an hour here, two hours there, etc. Basically it's very common to forget what you were working on last and what the original purpose of this super great idea for an application was.
If you spend a few hours writing down specs and requirements it will be very valuable to you 6 months down the road when you get some free time or your ADD switches to that project and you try to remember what it is this was suppose to do.
I just found out recently that writing specs is really the thing I need to get my projects done.
I've been a bit like you, tons of projects, jumping from one to the other and never getting things finished. Until about 6 months ago, when I started to actually write specs and have a kind of roadmap for my projects.
All that I can say is that, it actually works, because you break your projects into smaller steps, just like a race with checkpoints, and when you start to mark the checkpoints as done, it feels good, addictive and your focus will be on the finish line.
This way, you get to keep only 1 or 2 projects at the same time, but actually finish them. And of course, you have the extra and pretty valuable bonus of keeping up with the project even if you don't touch it for about a month or more. The specs will always be there to remind you of the goals and purposes of your project.
This is just my personal experience, and I believe that you should give it a try. Hopefully it will workout for you too.
I've been able to do some hobby projects and finish some of them. I try to finish them all but some i just cant muster.
The reason i think is that the amount of details that are needed to finish a projects are so many that it goes from a passion project to a chore of a project.
What helped me finish most of mine is that they stayed a passion until the finishing touches were left. So i just plowed through them.
Will a spec help, to some degree yes. They get you further into the project but almost always there's a point where the passion fades and you look for the next shiny object.
It doesn't work for me! Infact whenever I'm writing up specs I'm generally making the projects even bigger, and less likely to be finished.
Sometimes the best way to do it is to just do it.
Ze Frank explains this much better than me:
http://www.zefrank.com/theshow/archives/2006/07/071106.html (video link with swearing)
EDIT: Just to add. If you are finding you want to leave your half-finished project for a new, grand idea... do it! Don't look back!
Completion is not a requirement for your own pet projects. Nobody will blame you for not finishing stuff that barely anyone else would even bother starting.
The reason you started was because of passion. That is very important. You should not force yourself to 'slog through' in your free time. You will drain your passion which is your most vital resource.
I usually write a first set of spec when I get started.
I'm also a big fan of paper thinking, so I'll draw screens, UML, diagrams, flow charts, design elements... It's just a matter of defining the scope of your project and be able to watch what you had in mind. It really helps me think.
These documents will be my specs for the whole project. I will add others as I go, but I'm not trying to maintain the old ones as much as I would have it it was a work project: I know where I'm going and I can keep track of the changes looking at my code.
Of course, some of my hobby projects are done collaboratively. In these cases, I write down more specs in order to have a better communication with my team and I try to keep documents such as DB Diagrams up to date.
I also have several hobby projects that I have not finished. I have about 10 and have written a specification for exactly one of them, the largest in scope (also a game).
I have not finished either the ones without specifications, nor the one with. I think this is because I never publish the work or show it to anyone so it remains full of bugs and never 'finished.
I suppose that this means that regardless of whether or not you have a spec, it will not affect the success of the project as much as other factors, like having the time, motivation, help, and having confidence.
The single best thing I've ever found to help move towards completion is to have someone else working on the project with you. Find a friend (or two) who is interested in the same thing and design/code it with them. Not only do you have someone to bounce ideas off of, but you've also got someone to motivate you, not to mention progress is twice as fast so you'll hopefully finish before you give up :)
Of course, it requires source control, but you were already using that for your projects, right? :)
Do you want to finish them?
I think it's reasonable to never finish a hobby project. You can just keep working on it as long as you live. Aciddose has been working on his virtual instrument xhip for years, stubbornly never getting to 1.0, making the instrument patches people program worthless from one release to the next. Yet he and the users of his softsynth seem to be having a grand time.
Maybe if you just aim for a "release" and not being "finished" you'll be more satisfied. Betas let you keep dreaming.
Yes and no. I write notes in a notebook as I'm thinking about it, and add to it as I implement it. It is a somewhat different process from work projects where someone else may have to see the spec.
I finish about half of what I start.
I've helped with development on a range of systems from safety critical avionics to throwaway personal projects like a Sudoku solver. Obviously with the avionics systems, specifications were critical to the safe operation of the system and to prevent killing somebody, but I've never bothered with my personal projects.
I think this is because specs are generally boring to read and write. Joel wrote an interesting article about this, and how to make them better if you do write them:
Painless Functional Specifications
Unfortunately I haven't had the guts to try making my specs more fun to read at work yet.
Maybe intead of writing specs you should try working on some projects for or with other people? That could provide some external motivation. I do some web devleopment for my cousin's drive in theater, and if they need a feature they won't stop asking me about it until I finish it.
The single biggest piece of advice I could give you would be to get something out there - make the spec for your first version small enough that you actually feel you can complete it, even though it won't have nearly all the features you want.
Once you get something out there, the pressure from users of your software will be enough to hopefully keep you going on it. It also ensures that the direction you take in development is the same direction your users want you to go.
If you don't actually get any users, then don't feel so bad about dropping the project - if nobody is interested, it probably isn't worth pursuing.
If pressure from your users isn't enough to keep you focused, then open source it. If there's enough interest in it, somebody else will pick it up where you left off, and you are free to move on to bigger and better things.
Unfortunately, after writing specs for the core of the DIFL engine (don't bother looking it up, as there's no trace of it outside my home systems), I still didn't finish it up.
Short answer: developing specifications for a hobby project is neither necessary nor sufficient to guarantee completion.
That being said...
I keep an engineering notebook for all of my personal projects. I use the notebook to capture all sorts of things about the projects on which I work. This includes project motivation, valuable resources leveraged during the project, things developed over the course of the project that might potentially be reused later, key insights gained, etc. etc. It also includes, more to your question, specifications for most of the projects. I employ an agile/lean approach to creating these specifications which, for me, is compelling from a cost/benefit perspective.
btw...I have many, many personal projects that did not culminate in a complete working system. Some of these I might get around to completing 'someday maybe'. I consciously chose to stop working on some of the others because they had served their purpose (e.g. introduced me to a new technology, helped me better understand a language feature, etc.) Continuing to crank away at projects like these would have led to diminishing returns so I chose to reallocate my time to projects I felt were higher leverage.
The real question is: what is your hobby? Is it finishing a project, or tinkering. If getting the last ten yards is a chore, you have to decide if it's worth it to you. Writing detailed specs will work; so will self-flagellation if you're into that sort of self-discipline. Nothing will make it easy if it's against your make-up, so you have to decide whether the end-goal is worth anything to you.
And, just to demonstrate that there is nothing programming-specific about this point, you might really like this guy. One of the main points in his work is that conceptual artists, such as Picasso and Da Vinci never really cared about the final execution--the idea was everything, and, having asserted it, they were strangely content with someone else finishing the actual work or leaving the sketch unfinished and unpublished.
I'm not sure that writing specs is the solution to your problems (or mine which seem similar) however in the case where I want to make something more than a throwaway experiment there are a few things that help me slightly without taking the fun out of it.
Specs really are quite tight and should be technical but for a hobby approach you could write up a little bit of something similar much more loose that outlines some of the things you would like to feature and shows how they fit together in a sort of design draft. Though not as detailed or restrictive as a proper spec it might help to keep the tinkering leading in the right direction.
Secondly you could break it down and depending on your time allowances maybe add a few goals in. If you focus on building one part of the project as a time breaking it into subprojects that can be linked together at the end, it gives a feeling of progress as you move from part to part rather than feeling like you have been working on the same thing for ages and can't be bothered any more. It works if you tick it off on a list, since usually it has to happen atleast mentally anyway.
In saying this if your goal is to play with certain concepts and not actually create a final product then you probably won't because you aren't working towards it. One way might be to take the above mentioned idea of breaking it up and then find a way of adding something personally interesting into each part that bores you, maybe trying to add a challenge into it or something.
I'm not particularly experienced still learning, but this is how I keep my tinkering together(sometimes unless I hit a total block cause by inexperience) and how I've approached many multimedia and web projects on a hobby basis in past years. Though the guy who said open-source it when you get bored and let someone else pick it up, that was a good idea if you want to see your code used but have satisfied your personal goals.
I have much the same problem. One thing I've noticed that HAS helped though, is lowering my ambitions. like WAY WAY low. Writing a spec is one way to reign in the ambitions, if you have some kind of limiting rule for the spec, like "The spec can only be one page", or "the spec can be no longer than 300 words long", or "Spec only something that I can get done in one day of coding". Getting the balance right can take some practice. If you go with the last limit, you can impose the rule of MANDATORY dismissal of the project if you can't finish it in one day.
The nice thing about this, is it limits you to achievable goals. This might sound really stupid or wrong at first. Or maybe it sounds reasonable, but you just can't help it, you wanna do amazing things, not ordinary things! Not small things that you can only get done in a few hours!
but keep this in mind:
“A complex system that works is
invariably found to have evolved from
a simple system that worked. The
inverse proposition also appears to be
true: A complex system designed from
scratch never works and cannot be made
to work. You have to start over,
beginning with a working simple
system.”
—John Gall
It is SO MUCH easier to make that ambitious project, if you already have a FINISHED and WORKING project to base it on. Then the "more complex thing" CAN be a project that fits in a day. This is the ideal and philosophy I'm working towards, because I think it has the best chance of succeeding. Looking at past successful projects, the vast majority of them evolved in this way, whether it was intentional or not.
What helps me a lot is to split a new feature into small tasks that could each be done in an evening hacksession. So if I have time, I simply pick one task from the list and just finish it. This is often enough to get "in the flow" and do "just one more".
I do this only for one feature at a time so I don't get distracted by all the other cool things I could add to my application.
I constantly write specs for my projects, in work, at university and outside in my free time. The biggest weakness of a programmer is his/her memory, so I find it good to keep myself busy during my thinking time by writing down my every thought into some sort of structured document. Before you know it you've written a full database schema or have a Requirements Specification.
At the moment I'm working on improving my SQL skills, and I've been spending a lot of this free time between writing queries writing down my experienced. After a couple of tweaks I had a decent document outlining what needed to be done.
I think the core problem is not the lack of specs, but rather that finishing something (anything) is hard.
It is hard work. It may seem as if your program is 90 % done. But doing those last 10 % (rooting out all bugs, getting the application to release quality, writing documentation, etc) requires as much work as the first 90 %. And if you want to be serious about marketing your program, answering support emails, fixing other people's bugs, that's more work still. And perhaps not the kind of work you are most interested in.
It is also hard mentally. An unfinished project has unlimited potential. It is an empty canvas where you can project your unbridled ambitions, lofty ideals and revolutionary thoughts. Once it is finished and made real you have to see it for what it is. Limited. Flawed. Never as pretty as the idea that spawned it.
That said, finishing something can also be very rewarding. You learn a lot, get a reality check on your ideas, the satisfaction of having completed something and you get to see what other people think of your work.
Some advice:
Make sure that you really want to finish the project. I.e., that the rewards are worth all the hard work. (If not, then accept that fact and remain a happy tinkerer.)
Find ways of motiviating yourself through the "boring" parts. Specs, maybe, if it keeps you focused. But find whatever works for you, whether it is ticking of todo-items, rewarding yourself with a cookie or dreaming of fame and fortune.
Release early, release often. The more you save for a "big release" the bigger is the chance that that release never happens.
First release, then rewrite. When you feel the urge to do a major rewrite, do a release first, then do the rewrite (if you are still up for it). Software is never perfect. If you strive for perfection without any pressure to release your half-baked (but existing) code, then you will never be done.
Most hobby projects of mine don't really get finished either. As long as I'm working on something and learning though I don't think thats a problem. Currently I'm not writing specs, but I am practicing/training TDD. I bring it up as I write tests that are the specs. Some days I'll sit down and just create a bunch of tests outlining what the software should do. Some days I make those tests pass. Its enjoyable in that I don't have to keep the code all in my head, and at any point I can sit down and make further progress by fixing the broken tests. Things just work, its kind of surreal.
Joel's article about the Evidence Based Scheduling works for me. Though I implemented it differently.
The idea is to break the project into small tasks and give estimates, then make a forecast when your project will finish based on the time the finished tasks took to finish them.
You may think your project will take years to finish, but actually from the estimate it's just two months or less. If you work more and finish tasks quickly, you will see the finish date coming earlier.
I think the most motivating thing to proceed forward is seeing the goal coming closer you run towards.
Plus: create something you will use later. Using stuff gives you incentive to improve it later.
I have heard many developers refer to code as "legacy". Most of the time it is code that has been written by someone who no longer works on the project. What is it that makes code, legacy code?
Update in response to:
"Something handed down from an ancestor or a predecessor or from the past" http://www.thefreedictionary.com/legacy. Clearly you wanted to know something else. Could you clarify or expand your question? S.Lott
I am looking for the symptoms of legacy code that make it unusable or a nightmare to work with. When is it better to throw it away? It is my opinion that code should be thrown away more often and that reinventing the wheel is valuable part of development. The academic ideal of not reinventing the wheel is a nice one but it is not very practical.
On the other hand there is obviously legacy code worth keeping.
By using hardware, software, APIs, languages, technologies or features that are either no longer supported or have been superceded, typically combined with little to no possibility of ever replacing that code, instead using it til it or the system dies.
What is it that makes code, legacy code?
As with plain legacy, when the author is dead or missing, you as a heir get all or some of his code.
You shed some tears and try to figure out what to do with all this rubbish.
Michael Feathers has an interesting definition in his book Working Effectively with Legacy Code. According to him legacy code is code without automated tests.
It is a very general (and oft abused term) but any of the following would be legitimate reasons to call an app legacy:
The code base is based on a language/platform which is entirely unsupported by the manufacturer of the original product (often said manufacturer has gone out of business).
(really 1a) The code base or platform on which it is built is so old that getting qualified or experienced developers for the system is both hard and expensive.
The application supports some aspect of the business which is no longer actively grown and for which alterations are extremely rare, normally to fix it if something entirely unexpected changes around it (the canonical example being the Y2K issue) or if some regulation/external pressure forces it. Since both reasons are pressing and normally unavoidable but no significant development has occurred on the project it is likely that those people assigned to deal with this will be unfamiliar with the system (and it's accumulated behaviours and intricacies). In these cases this would often be reason to increase the perceived and planned for risk associated with the project.
The system has/or is being replaced with another. As such the system may be used for much less than originally intended, or perhaps only as a means of viewing historical data.
Legacy generally refers to code that is no longer being developed - meaning that if you use it, you have to use it on its original terms - you cannot just edit it to support the way the world looks today. For example, legacy code has to run on hardware that may not exist today - or is no longer supported.
According to Michael Feathers, the author of the excellent Working Effectively with Legacy Code, legacy code is a code which has no tests. When there is no way to know what breaks when this code changes.
The main thing that distinguishes
legacy code from non-legacy code is
tests, or rather a lack of tests. We
can get a sense of this with a little
thought experiment: how easy would it
be to modify your code base if it
could bite back, if it could tell you
when you made a mistake? It would be
pretty easy, wouldn't it? Most of the
fear involved in making changes to
large code bases is fear of
introducing subtle bugs; fear of
changing things inadvertently. With
tests, you can make things better with
impunity. To me, the difference is so
critical, it overwhelms any other
distinction. With tests, you can make
things better. Without them, you just
don’t know whether things are getting
better or worse.
Nobody is gonna read this, but I feel the other answers don't get it quite right:
It has value, if it wasn't useful it would've been thrown away long ago
Its hard to reason about because either of
Lack of documentation,
Original author cannot be found or forgot (yes 2 months later your code can be legacy code too!!),
Lack of tests or typesystem
Doesn't follow modern practices (ie no context to hold on too)
There is a requirement to change or extend it.
If there isn't a requirement to change it, it isn't legacy code
since nobody cares about it. It does its thing and there is nobody
around to call it legacy code.
A colleague once told me that legacy code was any code that you hadn't written yourself.
Arguably, it's just a pejorative term for code that we don't like any more for whatever reason (typically because it's not cool or fashionable but it works).
The TDD brigade might suggest that any code without tests is legacy code.
Legacy code is source code that relates to a no-longer supported or manufactured operating system or other computer technology.
http://en.wikipedia.org/wiki/Legacy_code
"Legacy code is source code that relates to a no-longer supported or manufactured "
Any code with support (or documentation) missing. Be it:
inline comments
technical documentation
spoken documentation (the person who wrote it)
unit tests documenting the workings of the code
For me legacy code is code that was written prior to some paradigm shift.
It may still be very much in use but it is in the process of being refactored to bring it into line.
e.g. Old procedural code hanging around in an otherwise OO system.
Code (or anything else, really) becomes "legacy" when it has been replaced by something newer/better, and yet despite this it's still used and kept alive "in the wild".
Preserving legacy code is not so much an academic ideal as it is keeping code that works, no matter how poorly. In many conservative enterprise situations, that would be considered more practical than throwing it away and starting again from scratch. Better the devil you know...
Legacy code is code that is painful/expensive to keep current with changing requirements.
There are two ways that this can happen:
The code is unsuitable for change
The semantics of the code have been swapped out to silicon
1) is the easier of the two to recognize. It is software that has fundamental limits making it unable to keep up with the ecosystem around it. For example, a system built around O(n^2) algorithm won't scale beyond a certain point and must be re-written if requirements move in that direction. Another example is code using libraries that are not supported on the latest OS versions.
2) Is harder to recognize, but all code of this kind shares the characteristic that people are afraid to change it. This could be because it was badly written/documented to begin with, because it is untested, or because it is non-trivial and the original authors who understood it left the team.
The ASCII/Unicode chars that comprise living code have semantic meaning, the "why's", "what's" and to some degree the "how's", in the minds of people associated with it. Legacy code is either un-owned or the owners do not have meaning associated with large portions of it. Once this happens (and it could happen the next day with really poorly-written code), to change this code, someone must learn it and understand it. This process is a significant fraction of the time it takes to write it in the first place.
The day you're afraid to refactor your code is the day when your code has become legacy.
I consider code "legacy" if any or all of the following conditions apply:
It was written using a language or methodology that is a generation behind current standards
The code is a complete mess with no planning or design behind it
It is written in outdated languages and in an outdated, non object-oriented style
It is difficult to find developers who know the language because it is so old
Unlike some of the other opinions here, I've seen plenty of modern applications that work decently without unit tests. Unit testing still has not caught on with everyone. Perhaps ten years from now the next generation of programmers will look at our current applications and consider them "legacy" for not containing unit tests, just as I consider non object-oriented applications to be legacy.
If few changes need to be made to a legacy codebase, it's better to simply leave it as-is and go with the flow. If the application needs drastic functionality changes, a GUI overhaul, and/or you can't find anyone who knows the programming language, it's time to throw away and start over. A word of warning, however: rewriting from scratch can be very time-consuming, and it's difficult to know if you've replicated all functionality. You'll probably want to have test cases and unit tests written for the legacy application and the new application.
Quite honestly legacy code is any code, framework, api, of other software construct thta's not "cool" anymore. For example COBOL is unanimously regarded as legacy while APL is not. Now one can also make the case that COBOL is consideed legacy and APL not because it has about 1m times the install base as APL. However, if you say that you need to work on APL code the reply would not be "oh no, that legacy stuff" but rather "oh my god, guess you won't be doing anything for the next century" see the difference?
This is a general term thrown around quite often (and quite generically) in the software ecosystem.
Well, I like to think of legacy code as inherited code. This is simply code that was written in the past. In most cases, legacy code do not follow new/current practices and is often considered archaic.
Legacy code is anything written more than a month ago :-)
It's often any code that isn't written in the trendy scripting language du jour, and I'm only half joking.
Say there are two possible solutions to a problem: the first is quick but hacky; the second is preferable but would take longer to implement. You need to solve the problem fast, so you decide to get the hack in place as quickly as you can, planning to start work on the better solution afterwards. The trouble is, as soon as the problem is alleviated, it plummets down the to-do list. You're still planning to put in the better solution at some point, but it's hard to justify implementing it right now. Suddenly you find you've spent five years using the less-than-perfect solution, cursing it the while.
Does this sound familiar? I know it's happened more than once where I work. One colleague describes deliberately making a bad GUI so that it wouldn't be accidentally adopted long-term. Do you have a better strategy?
Write a test case which the hack fails.
If you can't write a test which the hack fails, then either there's nothing wrong with the hack after all, or else your test framework is inadequate. If the former, run away quick before you waste your life on needless optimisation. If the latter, seek another approach (either to flagging hacks, or to testing...)
Strategy 1 (almost never selected): Don't implement the kluge. Don't even let people know it's a possibility. Just do it the right way the first time. Like I said, this one is almost never selected, due to time constraints.
Strategy 2 (dishonest): Lie and Cheat. Tell management that there are bugs in the hack, and they could cause major problems later on. Unfortunately, most of the time, the managers just say to wait until the bugs become a problem, then fix the bugs.
Strategy 2a: Same as strategy 2, except there really are bugs. Same problem, though.
Strategy 3 (and my personal favorite): Design the solution whenever you can, and do it well enough that an intern or code-monkey could do it. It's easier to justify spending the small amount of code-monkey money than to justify your own salary, so it might just get done.
Strategy 4: Wait for a rewrite. Keep waiting. Sooner or later (probably later), someone is going to have to rewrite the thing. Might as well do it right then.
Here is a great related article on technical debt.
Basically, it is an analogy of debt with all the technical decisions you make. There is good debt and bad debt... and you have to pick the debt that is going to achieve the goals you want with the least long term cost.
The worst kind of debt is small little accumulating shortcuts that are analogous to credit card debt... each one doesn't hurt, but pretty soon you are in the poor house.
This is a major issue when doing deadline driven work. I find that adding very detailed comments about why this way was chosen and some hints at how it should be coded help. This way people looking at the code see it and keep it fresh.
Another option that will work is add a bug.feature in your tracking framework (you do have one, right?) detailing the rework. That way it is visible and may force the issue at some point.
The only time you can ever justify fixing these things (because they're not really broken, just ugly) is when you have another feature or bug fix that touches the same section of code, and you might as well re-write it.
You have to do the math on what a developer's time costs. If software requirements are being met, and the only thing wrong is that the code is embarrasing under the hood, it's not really worth fixing.
Whole companies can go out of business because over-zealous engineers insist on a re-architecture every year or so when they get antsy.
If it's bug-free and meets requirements, it's done. Ship it. Move on.
[Edit]
Of course I'm not advocating that everything be hacked in all the time. You have to design and write code carefully in the normal course of the development process. But when you do end up with hacks that just had to be done quickly, you have to do a cost-benefit analysis on whether or not it's worth it to clean up the code. If over the lifetime of the application you will spend more time coding around a messy hack than you would have fixing it, then of course fix it. But if not, it's way too expensive and risky to re-code a working, bug-free application just because looking at the source makes you ill.
YOU DON'T DO INTERIM SOLUTIONS.
Sometimes I think programmers just need to be told this.
Sorry about that, but seriously--a hacky solution is worthless and even on the first iteration can take longer than doing a portion of the solution correctly.
Please stop leaving me your crap code to maintain. Just ALWAYS CODE IT RIGHT. No matter how long it takes and who yells at you.
When you are sitting there twiddling your thumbs after delivering early while everyone else is debugging their stupid hacks, you'll thank me.
Even if you don't think you are a great programmer, always strive to do the best you can, never take shortcuts--it doesn't cost you ANY time to do it right. I can justify this statement if you don't believe me.
Suddenly you find you've spent five years using the less-than-perfect solution, cursing it the while.
If you're cursing it, why is it at the bottom of the TODO list?
If it's not affecting you, why are you cursing it?
If it is affecting you, then it's a problem that needs to be fixed NOW.
I make sure that I'm vocal about the priority of the long term fix ESPECIALLY after the short term fix has gone in.I detail the reasons why it's a hack and not a good long term solution and use those to get the stakeholders (managers, clients, etc) to understand why it needs to be fixed Depending on the case, I may even inject a bit of worst case scenario fear in there. "If this safely line snaps, the whole bridge could collapse!"I take responsibility for coming up with a long term solution and make sure that it gets deployed
It is a hard call. I have done hacks personally cause, sometimes you HAVE to get that product out the door and into the customers hands. However, the way that I take care of it is to just do it.
Tell the project lead or your boss, or the customer: There are some spots that need to be cleaned up, and coded better. I need a week to do it, and it is going to cost less to do it now, then it will be to do it 6 months from now, when we need to implement an extension onto the subsystem.
Usually problems like this arise from bad communication with management or the customer. If the solution works for the customer then they see no reason to ask for it to be changed. So they need to be told about the tradeoffs you made beforehand so they can plan extra time to fix the problems after you've implemented the quick solution.
How to solve it depends a bit on why it's a bad solution. If your solution is bad because it's hard to change or maintain then the first time you have to do maintenance and have a bit more time then that is the right time to upgrade to a better solution. In this case it helps if you tell the customer or your boss that you took a shortcut in the first place. That way they know that they can't expect a fast solution next time around. Cripling the UI can be a good way to make sure the customer comes back to get stuff fixed.
If the solution is bad because it's risky or unstable then you really need to talk to the person doing the planning and have some time planned in to fix the problem asap.
Good luck. In my experience this is almost impossible to achieve.
Once you go down the slippery slope of implementing a hack because you are under pressure then you might as well get used to living with it for all time. There is almost NEVER enough time to re-work something that already works, no matter how badly it is implemented internally. What makes you think you will magically have more time "at some later date" to fix the hack?
The only exception I can think of to this rule is if the hack completely prevents you from implementing another piece of functionality that is needed by a customer. Then you have no choice but to do the re-work.
I try to build the hacky solution so that it can be migrated to the longterm way as painlessly as possible. Say you got a guy who is building a database in SQL Server cuz that's his strongest DB, but your corporate standard is Oracle. Build the db with as few non-transferable features (like Bit datatypes) as possible. In this example, it's not hard to avoid bit types, but it makes transitioning later an easier process.
Educate whoever is in charge of making the final decision why the hacky way of doing things is bad in the long-run.
Describe the problem in terms they can relate to.
Include a graph of cost, productivity, and revenue curves.
Teach them about technical debt.
Regularly refactor if you're pushed forward.
Never call it "refactoring" or "going back and cleaning up" in front of non-technical people. Instead, call it "adapting" the system to handle "new features".
Basically, people who don't understand software don't get the concept of revisiting things that already work. The way they look at it, developers are like mechanics who want to keep taking apart and reassembling the entire car every time someone wants to add a feature, which sounds insane to them.
It helps to make analogies to everyday things. Explain to them how when you made the interim solution, you made choices that suited building it quickly, as opposed to being stable, maintainable, etc. It's like choosing to build with wood instead of steel because wood is easier to cut, and thus, you could build the interim solution quicker. The wood, however, simply can not support the foundation of a 20-story building.
We use Java and Hudson for continuous integration. 'Interim solutions' must be commented with:
// TODO: Better solution required.
Every time Hudson runs a build it provides a report of each TODO item so that we have an up to date, highly visible record of any outstanding items that need improved.
Great question. This bothers me a lot, too - and most of the time I'm the sole person responsible for prioritizing issues in my own projects (yep, small business).
I found out that the problem that needs to be fixed is usually just a subset of the problem. IOW, the customer that needs an urgent fix does not need the whole problem to be solved, just a part of it - smaller or larger. That sometimes enables me to create a workaround that is not solution to the complete problem but just to the customer's subset and that allows me to leave the bigger issue open in the issue tracker.
That may of course not apply at all to your work environment :(
This reminds me of the story of "CTool". In the beginning CTool was put forward by one of our devs, I'll call him Don, as one possible way to solve the problem we were having. Being an earnest hard-working type, Don plugged away and delivered a working prototype. You know where I am going with this. Overnight, CTool became a part of the company work flow with an entire department depending on it. By the second or third day, bitter complaints started streaming in about CTool's shortcomings. Users questioned Don's competence, commitment and IQ. Don's protests that this was never supposed to be a production app fell on deaf ears. This went on for years. Finally, someone got around to re-writing the app, well after Don had departed. By this time, so much loathing had become attached to the name CTool that naming it CTool version 2 was out of the question. There was even a formal funeral for CTool, somewhat reminiscent of the copier (or was it a printer?) execution scene in Office Space.
Some might say Don deserved the slings and arrows for not making it go right to fix CTool. My only point is that saying you should never hack out a solution is probably unjustifiable in the Real World. But if you are the one to do it, tread cautiously.
Get it in writing (an email). So when it becomes a problem later management doesn't "forget" that it was supposed to be temporary.
Make it visible to the users. The more visible it is the less likely people are going to forget to go back and do it the right way when the crisis is over.
Negotiate before the temp solution is in place for a project, resources, and time lines to get the real fix in. Work for the real solution should probably begin as soon as the temp solution is finished.
You file a second very descriptive bug against your own "fix" and put a to-do comment right in the affected areas that says, "This area needs a lot of work. See defect #555" (use the right number of course). People who say "don't put in a hack" don't seem to understand the question. Assume you have a system that needs to be up and running now, your non-hack solution is 8 days of work, your hack is 38 minutes of work, the hack is there to buy you time to do the work and not lose money while you're doing it.
Now you still have to get your customer or management agree to schedule the N*100 minutes of time required to do the real fix in addition to the N minutes needed now to fix it. If you must refuse to implement the hack until you get such agreement, then maybe that's what you have to do, but I've worked with some understanding people in that regard.
The real price of introducing a quick-fix is that when someone else needs to introduce a 2nd quick fix, they will introduce it based on your own quick-fix. So, the longer a quick-fix is in place, the more entrenched it will become. Quite often, a hack takes only a little bit longer than doing things right, until you encounter a 2nd hack which builds on the first.
So, obviously it is (or seems to be) sometimes necessary to introduce a quick fix.
One possible solution, assuming your version control supports it, is to introduce a fork from the source whenever you make such a hack. If people are encouraged to avoid coding new features within these special "get it done" forks, then it will eventually be more work to integrate the new features with the fork than it will be to get rid of the hack. More likely, though, the "good" fork will get discarded. And if you are far enough away from release that making such a fork will not be practical (because it is not worth doing the dual integration mentioned above), then you probably shouldn't even be using a hack anyways.
A very idealistic approach.
A more realistic solution is to keep your program segmented into as many orthogonal components as possible and to occasionally do a complete rewrite of some of the components.
A better question is why the hacky solution is bad. If it is bad because it reduces flexibility, ignore it until you need flexibility. If it is bad because it impacts the programs behavior, ignore it and eventually it will become a bug fix and WILL be addressed. If it is bad because it looks ugly, ignore it, as long as the hack is localized.
Some solutions I've seen in the past:
Mark it with a comment HACK in the code (or similar scheme such as XXX)
Have an automatic report run and emailed weekly to those that care which counts how many times the HACK comments appear
Add a new entry in your bug tracking system with the line number and description of the right solution (so the knowledge gained from the research before writing the hack isn't lost)
write a test case that demonstrates how the hack fails (if possible) and check it into the appropriate test suite (i.e. so that it throws errors that someone will eventually want to cleanup)
once the hack is installed and the pressure is off, immediately start on the right solution
This is an excellent question. One thing I've noticed as I get more experience: hacks buy you a very short amount of time, and often cost you a huge amount more. Closely related is the 'quick fix' that solves what you think is the problem -- only to find when it blows up that that it wasn't the problem at all.
Setting aside the debate about whether you should do it, let's assume that you have to do it. The trick now is to do it in a way that minimizes long range affects, it easily ripped out later, and makes itself a nuisance so you remember to fix it.
The nuisance part is easy: make it issue a warning every time you execute the kludge.
The ripped out part can be easy: I like to do this be putting the kludge behind a subroutine name. That makes it easier to update since you compartmentalize the code. When you get your permanent solution, you're subroutine can either implement it or be a no-op. Sometimes a subclass can work nicely for this too. Don't let other people depend on whatever your quick fix is, though. It's difficult to recommend any particular technique without seeing the situation.
Minimizing long range effects should be easy if the rest of the code is nice. Always go through the published interface, and so on.
Try to make the cost of the hack clear to the business folks. Then they can make an informed decision either way.
You could intentionally write it in way that is overly restrictive and singe purposed and would require a re-write to be modified.
We had to do this once - make a short term demo version that we knew we did not want to keep. The customer wanted it on a winTel box, so we developed the prototype in SGI/XWindows. (We were fluent in both, so it wasn't a problem).
Confession:
I have used '#define private public' in C++ in order to read data from some other code layer. It went in as a hack but works well and fixing it has never become a priority. It is now 3 years later...
One of the main reasons hacks do not get removed is the risk that one introduces new bugs while fixing the hack. (Especially when dealing with pre-TDD code bases.)
My answer is a bit different from the others. My experience is that the following practices help you stay agile and move from hackey first iteration/alpha solutions to beta/production ready:
Test Driven Development
Small units of refactoring
Continous Integration
Good Configuration management
Agile database techniques/database refactoring
And it should go without saying you have to have stakeholder support to do any of these correctly. But with these products in place you have the right tools and processes to quickly change a product in major ways with confidence. Sometimes your ability to change is your ability to manage the risk of the changes and from the development perspective these tools/techniques give you surer footing.
Any useful metrics will be fine
One of the things that I look for in a code is unit test. This will give the freedom to refactor it. So if the code does not have tests I consider it a legacy code.
If the code:
has been replaced by newer code that implements the same or functionality or better
is not being used by current systems
is soon to be replaced by something else altogether
has been archived for historic reasons
when vendors stop supporting it
We use the term "legacy" to refer to any code, still in use, developed using technology we have ceased active development in.
It is code that we would rather rewrite using more recent tools than modify in its current state.
Micheal Feathers, Author of the excellent "Working Effectively with Legacy Code", defines it as any code that does not have tests.
A better question would probably be what marks a piece of code as non legacy.
To me legacy means unchangeable. So as soon as you're no longer 'able' to change it it's legacy.
Whether that ability is removed by fixed requirements, fear of breakage, knowledge loss, or some other impact is largely irrelevant.
A related note is that I don't think I'd ever use the exact word legacy as it stirs up too many emotions to be useful.
I don't believe there is a definitive answer, but I do believe that the likelihood that code is legacy code increases with the number of people who don't want to touch it and the likelihood that changing it will cause it to break.
the term "legacy code" is subjective and is probably a loaded term. but in general I subscribe to the view that legacy code is one that is not unit-testable and as such is hard to refactor.
When the code is old enough you never met the developer who originally wrote the code.
When 3rd party libraries aren't supported anymore.
In my opinion all code that is written is legacy code. It might take some time before the original intent and all the decisions made about the code is forgotten but sooner or later you cannot imagine what they were thinking while writing it. You never write legacy code yourself, right?
Using unit tests or some measure like seconds since the developer has left the building do not really measure whether or not the code is legacy code. Legacy code may have a good set of unit tests and comments and it may have undergone a strict code review and other analysis. This doesn't mean that the code is still relevant for the program at hand. It just suggests that the code might be comparably well written. And if it is no longer relevant, the code will actually make it harder to solve the problem the program is developed for.
Legacy code has been defined in many places as "code without tests". I don't think they are specific in the types of tests, but in general, if you can't make a change to your code without the fear of something unknown happening, well, it quickly devolves.
See "Working Effectively with Legacy Code"
I maybe wrong, but I don't think there is an established metric for this.
Usually a piece of code is deemed to be legacy, when it has seen at least 5-6 release cycles( maybe more ). More often than not, the Original Implementor is no longer around and the code is maintained through.
Almost seconds after the devs leave the premises. :)
If...
there's no money in the bank for new features
you can't find anyone that admits working on the project that needs fixing
the source code to the project you own has gone MIA
...then you're working on legacy code.
Usually people refer to something as legacy code when no one is still around that is familiar with or feels comfortable maintaining the code.
Unit tests make it easier for people unfamiliar with code to dig into it, so the theory is it helps prevent code from becoming "legacy".
Often when code is legacy it is changed in a different manner. People are afraid to change it, but also changes tend to be quick and dirty because nobody understands full consequences. Code duplication issues may arise, because people don't want to take the risk associated with deeper changes.
So, in such circumstances, the situation may get worse, at an increasing rate.
I don't know of any real metrics that can be used to determine if something is "legacy code" or not, but anything older than just written could be considered legacy. Legacy code means different things to different people/organizations, so it really is somewhat subjective.