Which should one code first, functionality or validity checks? - language-agnostic

When coding up, say, a registration form from scratch, does it make sense to get it functioning with the expected inputs first, and then go back and catch/handle the unexpected inputs and deal with errors?
The alternative would be to process the input, checking any constraints and insuring that they are handled properly, and then dealing with getting the typical use case functioning correctly.
Is one way preferable to the other, and if so why? Also, is there an alternate way to go about this sort of 2-part task?
To clarify, by validity, I mean more than just data validation, including business rules, such as "No more than X people can register for this event"

The best thing to do, in my opinion, is to get a decent first version working which may not deal with all of the unexpected cases completely, but is made thoughtfully and modularly. You can then go back and perfect the logic so that your tests pass.
In the real world, this method pays off because you're more likely to be productive and interested when there's something working with a few kinks, rather than just being stuck trying to figure out all the edge cases in your head and deal with them at the start.

One way would be to take a TDD approach (assuming you write unit tests). Start off by writing your unit tests, then work on getting those unit tests to pass.
In my opinion UI work should be done last since you probably have plenty to do on the back-end functionality.

I'm strongly in favor of writing the validators first. Once you have something that seems like it's working it will be much harder to go back and add the validators. By writing the validators first (but not necessarily handling the exceptions they raise) you make sure that you get them done. It also gives you a good chance to think about exactly what you're expecting - this can help you think of special cases you'll need to cover later.
Tracer bullets are nice, but the principle doesn't need to be applied to every aspect of a project.

Write the test for a method of the validation object
Write the method of the validation object
Test until pass
Repeat 1-3 until all methods are tested.
Write the form and feed the data into the tested and working validation object methods.
Use the same procedure for the business object handling the data post validation.

I always like coding up the functionality first and then adding validation later. Seems fine to me seeing as most coding is done in a development environment and no one will be submitting data while you're working on it.
But whichever way you feel comfortable with is best.
The downside to my method is that, if you're disorganized, you may forget to add a validity or sanitation check somewhere. And it does happen :-)

If it's not TOO much work, I'd start with basic input validation. Especially for dates or identifiers, like order numbers. It's easier to loosen validation than to tighten it. Basic input validation can save a lot of debugging time further down in the backend.
On the other hand, say you're talking good-looking Javascript enabled validation that supports multiple languages. In that case you'd be better off writing a simple first version, and develop the backend based on that. You can polish up the input form when it's beginning to approach a final version.

Once you get the functionality working, will you REALLY go back and add validation code?
Most of us would like to, but many of us run out of time at the end of the project.

I'm not sure that designing one way or the other is superior... While Mr. Leekman says that cranking out the functional part and then going back and doing the tedious work of edge-cases is more productive, I'm not sure I see how.
Having the functional code is not useful (and even dangerous) if you're allowing non-permissible values (i.e. values your functional code didn't expect) creep in.
Conversely great boundary checks are useless if the values that make it through aren't processed in some way.
In a production environment, you really have nothing to show if you are not covering both validity and functionality. The order in which you do these things really should be an issue of personal preference. If you're not feeling creative and just want something tedious to do, write the validity-checking portion of the code. If you've just figured out the perfect algo to do exactly what you need during your coffee break, sit down and write the functional part and go back to bounds-check later on.

I like to put up a couple of fields worth of functionality, then do the validation checks for that functionality, then the validation for the rest, and finally fill in the rest of the functionality.

If you're breaking the code up into "model" and "UI", then some of this is easier to figure, but basically requires a design choice: are your model objects going to assume correct inputs and put the onus on the UI, or test for correct inputs.
Now, me, being a belt and suspenders guy, I tend to answer that question "yes": the model will check the domain even though the inputs should be checked anyway. But in any case, once you make that decision, you have your answer: if domain checking is part of the definition, you should build it and test it with all the other parts. If you separate the domain checking from the functionality, build and test them separately.

Having seen too many databases where people neglected to do the validation methods, I vote for doing it first.

Related

First write code using API, then actual API - does this approach have a name and is valid for API design process?

Standard way of working on new API (library, class, whatever) usually looks like this:
you think about what methods would API user need
you implement API that you suspect user will need
So basically you trying to guess what your API should look like. It very often leads to over engineering stuff, huge APIs that you think user will need and it is very possible that great part of your code won't be used at all.
Some time ago, maybe few years even, I read some article that promoted writing client code first. I don't remember where I found it but author pointed out several advantages like better understanding how API will be used, what it should provide and what is basically obsolete. I think idea was that it goes along with SCRUM methodology and user stories but on implementation level.
Just out of curiosity for my latest private project I started not with actual API (some kind of toolkit library) but with client code that would use this API. Of course my code is all in red because classes, methods and properties does not exist and I can forget about help from intellisense but what I noticed is that after few days of coding my application "has" all basic functionalities and my library API "is" a lot smaller than I imagined when starting a project.
I don't say that if somebody took my library and started using it it wouldn't lack some features but I think it helped me to realize that my idea of this API was somewhat flawed because I usually try to cover all bases and provide methods "just in case". And sometimes it bites me badly because I made some stupid mistake in basic functions being more focused on code that somebody maybe would need.
So what I would like to ask you do you ever tried this approach when needed to create a new API and did it helped you? Is it some recognized technique that has a name?
So basically you're trying to guess what your API should look like.
And that's the biggest problem with designing anything this way: there should be no (well, minimal) guesswork in software design. Designing an API based on assumptions rather than actual information is dangerous, for several reasons:
It's directly counter to the principle of YAGNI: in order to get anything done, you have to assume what the user is going to need, with no information to back up those assumptions.
When you're done, and you finally get around to using your API, you'll invariably find that it sucks to use (poor user experience), because you weren't thinking about how the library is used (UX), you were thinking about what the library must do (features).
An API, by definition, is an interface for users (i.e., developers). Designing as anything else just makes for a bad design, without fail.
Writing sample code is like designing a GUI before writing the backend: a Good Thing. It forces you to think about user experience and practical effects of design decisions without getting bogged down in useless theorising and assumption.
And contrary to Gabriel's answer, this is not bottom-up design: it's top-down. Rather than design the concrete backend of your library and then force an abstract interface on top of it, you first design the interface and then worry about the implementation.
Generally speaking, the idea of designing the concrete first and abstracting from that afterwards is called bottom-up design. Test Driven Development uses similar principle to what you describe to support better design. Firstly you write a test, which is an use of code you are going to write afterwards. It is important to proceed stepwise, because you have to proove the API is implementable. IMportant part of each part is refactoring - this allows you design more concise API and reuse parts of your code.

Test Driven Design - where did I go wrong?

I am playing with a toy project at home to better understand Test Driven Design. At first things seemed to be going well and I got into the swing of failing tests, code, passing test.
I then came to add a test and realised it would be difficult with my current structure and that furthermore I should split a particular class which had too many responsibilities. Adding even more responsibilities for the next test was clearly wrong. I decided to put aside this test, and refactor what I had. This is where things started to go wrong.
It was difficult to refactor without breaking lots of tests at once, and then the only option seemed to be to make many changes and hope I ended up back at something where the tests passed again. The tests themselves were valid, I just had to break nearly all of them while refactoring. The refactoring (which I'm still not that happy with) took me five or six hours before I had returned to all tests passing. The tests did help me along the way.
It feels like I got off the TDD track. What do you think I did wrong?
As this is mostly a learning exercise I'm considering rolling back all that refactoring and trying to move forward again in a better fashion.
Perhaps you went too fast when splitting your class. The steps for the Extract Class Refactoring are as follows:
create the new class
have an instance of that class as a private data member
move field to the new class, one by one
compile and test for each field
move method to the new class, one by one, testing for each
That way you won't break a large number of tests while refactoring your class, and you can rely on the tests to make sure nothing was broken so far all along the class splitting.
Also, make sure you're testing the behavior, not the implementation.
I wanted to comment the accepted answer but my current reputation does not allow me. So here it is as a new answer.
TDD says:
Create a test that fails. Code a little. Make the test pass.
It insists on coding in tiny steps (especially when beginning). View TDD as systematic validation of successive refactorings you perform to build your programs. If you take too big a step, your refactoring will get out of control.
Perhaps you were testing a too low a level. It's hard to say without seeing you code, but typically I test a feature from end to end, and make sure all the behaviour I expected to happen, happened. Testing every single method in isolation will give you the test-web you have created.
You can use tools like NCover and DotCover to check that you haven't missed any code paths.
The only thing "wrong" was to add a test afterwards. In "true" TTD you first declare all tests before the actual implementation. I say "true" because that's often only theory. But in practice you still have the safty given by the tests.
This is, sadly, something TDD proponents don't talk about enough and that makes people try TDD and then abandon it.
What I do is what I call "High Level Testing" which consists in avoiding unit tests and doing exclusively high level tests (which we could could call "integration tests"). It works pretty well and I avoid the (very important) problem you mentioned. I wrote an article about it a while ago:
http://www.hardcoded.net/articles/high-level-testing.htm
Good luck with TDD, don't give up yet.
Also for TDD continuos regression of test cases are also necessary. So Continuos integration with Coverage tools(as mentioned above) are necessary. So that small changes(i.e Refactoring) can be regressed easily and whether any code path which are missed can be found easily.
Also I feel that if tests were not written previously, time should not be wasted to think whether to write tests or not. Tests should be written immediately.

How do you refactor a large messy codebase?

I have a big mess of code. Admittedly, I wrote it myself - a year ago. It's not well commented but it's not very complicated either, so I can understand it -- just not well enough to know where to start as far as refactoring it.
I violated every rule that I have read about over the past year. There are classes with multiple responsibilities, there are indirect accesses (I forget the term - something like foo.bar.doSomething()), and like I said it is not well commented. On top of that, it's the beginnings of a game, so the graphics is coupled with the data, or the places where I tried to decouple graphics and data, I made the data public in order for the graphics to be able to access the data it needs...
It's a huge mess! Where do I start? How would you start on something like this?
My current approach is to take variables and switch them to private and then refactor the pieces that break, but that doesn't seem to be enough. Please suggest other strategies for wading through this mess and turning it into something clean so that I can continue where I left off!
Update two days later: I have been drawing out UML-like diagrams of my classes, and catching some of the "Low Hanging Fruit" along the way. I've even found some bits of code that were the beginnings of new features, but as I'm trying to slim everything down, I've been able to delete those bits and make the project feel cleaner. I'm probably going to refactor as much as possible before rigging my test cases (but only the things that are 100% certain not to impact the functionality, of course!), so that I won't have to refactor test cases as I change functionality. (do you think I'm doing it right or would it, in your opinion, be easier for me to suck it up and write the tests first?)
Please vote for the best answer so that I can mark it fairly! Feel free to add your own answer to the bunch as well, there's still room for you! I'll give it another day or so and then probably mark the highest-voted answer as accepted.
Thanks to everyone who has responded so far!
June 25, 2010: I discovered a blog post which directly answers this question from someone who seems to have a pretty good grasp of programming: (or maybe not, if you read his article :) )
To that end, I do four things when I
need to refactor code:
Determine what the purpose of the code was
Draw UML and action diagrams of the classes involved
Shop around for the right design patterns
Determine clearer names for the current classes and methods
Pick yourself up a copy of Martin Fowler's Refactoring. It has some good advice on ways to break down your refactoring problem. About 75% of the book is little cookbook-style refactoring steps you can do. It also advocates automated unit tests that you can run after each step to prove your code still works.
As for a place to start, I would sit down and draw out a high-level architecture of your program. You don't have to get fancy with detailed UML models, but some basic UML is not a bad idea. You need a big picture idea of how the major pieces fit together so you can visually see where your decoupling is going to happen. Just a page or two of some basic block diagrams will help with the overwhelming feeling you have right now.
Without some sort of high level spec or design, you just risk getting lost again and ending up with another unmaintainable mess.
If you need to start from scratch, remember that you never truly start from scratch. You have some code and the knowledge you gained from your first time. But sometimes it does help to start with a blank project and pull things in as you go, rather than put out fires in a messy code base. Just remember not to completely throw out the old, use it for its good parts and pull them in as you go.
What was most important for me on different occasions were unit tests: I took a few days to write tests for the old code and then I was free to refactor with confidence. How exactly is a different question, but having the tests made it possible for me to make real, substantial changes to the code.
I'll second everyone's recommendations for Fowler's Refactoring, but in your specific case you may want to look at Michael Feathers' Working Effectively with Legacy Code, which is really perfect for your situation.
Feathers talks about Characterization Tests, which are unit tests not to assert known behaviour of the system but to explore and define the existing (unclear) behaviour -- in the case where you've written your own legacy code, and fixing it yourself, this may not be so important, but if your design is sloppy then it's quite possible there are parts of the code that work by 'magic' and their behaviour isn't clear, even to you -- in that case, characterization tests will help.
One great part of the book is the discussion about finding (or creating) seams in your codebase -- seams are natural 'fault lines', if you like, where you can break into the existing system to start testing it, and pulling it towards a better design. Hard to explain but well worth a read.
There's a brief paper where Feathers fleshes out some of the concepts from the book, but it really is well worth hunting down the whole thing. It's one of my favourites.
Just an additional refactoring that is more important than you think: Name things correctly!
This goes for any variable name and method name. If the name does not accurately reflect what the thing is used for, then rename it to something more accurate. This might require several iterations. If you cannot find a short, and entirely accurate name, then that item does too much and you have an excellent candidate for a code snippet that needs to be split. The names also clearly indicate where the cuts are to be made.
Also, document your stuff. Whenever the answer to WHY? is not clearly conveyed by the answer to HOW? (being the code) you will need to add some documentation. Capturing design decisions is probably the most important task as it is very hard to do in code.
You could always start from "scratch". That doesn't mean scrap it and start from nothing, but try to rethink high-level things from the beginning, since you seem to have learned a lot since the last time you worked on it.
Start from a higher level, and as you build the scaffolding of your new and improved structure, take all the code you can reuse, which will probably be more than you think if you're willing to read through it and make some small changes.
When you're making the changes, be sure to be strict with yourself about following all the good practices you now know, because you will really thank yourself later.
It can be surprisingly refreshing to properly re-make program to do exactly what it did before, only more "cleanly". ;)
As others have mentioned as well, unit-tests are your best friend! They help you ensure that your refactoring works, and if you're starting from "scratch", it's the perfect time to write them.
You're in a much better position than many people facing this problem in that you understand what the code is supposed to do.
Taking variables out of a shared scope, as you're doing, is a great start, in that you're partitioning responsibilities. Ultimately you want each class to express a single responsibility. A few other things you might look at:
Easy targets for refactoring are code that's duplicated in lots of places and long methods.
If you're managing application state through statically initialized singletons or worse, a global state that everything is talking to, consider moving it to a managed initialization system (i.e. a dependency injection framework like spring or guice) or at least make sure that the initialization isn't entangled with the rest of the code.
Centralize and standardize how you're accessing outside resources, especially if you've got things like file locations or urls hardcoded.
Buy an IDE that has good refactoring support. I think IntelliJ is the best, but Eclipse has it now, too.
The unit test idea is key as well. You will want to have a suite of large, overall transactions that will give you the overall behavior of the code.
Once you have those, start creating unit tests for classes and smaller packages. Write the tests to demonstrate proper behavior, make your changes, and re-run the tests to demonstrate that you haven't broken everything.
Track code coverage as you go. You'll want to work it up to 70% or better. For the classes you change, you'll want those to be 70% or better before you make your changes.
Build up that safety net over time and you'll be able to refactor with some confidence.
very slowly :D
No seriously... take it one step at a time. For instance, refactor something only if it affects or helps you write the current bug/feature that you are working on right now and do no more than that. And before you refactor make darn sure that you have some kind of automated test in place that gets run on each build that will actually test what you are writing/refactoring. Even if you don't have unit tests, it is never too late to start adding them for all new and modified code that is being written. Over time, your code base will get better in small increments daily or weekly instead of worse - all without you making monumental heaps of changes.
In my personal opinion and experience, it's not worth it to just refactor a (legacy) codebase en masse for the sake of refactoring. In those cases, it's best to just start from scratch and do it right all over again (and very rarely are you afforded the opportunity to do such a thing). Hence, just refactoring incremental is the way to go.
For Java code, my favorite first step is to run Findbugs and then remove all the dead stores, un-used fields, unreachable catch blocks, unused private methods and likely bugs.
Next I run CPD to look for evidence of cut-copy-paste code.
It isn't unusual to be able to reduce the code base by 5% by doing this. It also saves you from refactoring code that is never used.
I think you should use Eclipse as a IDE because it is having many plugins and free of cost.You should now follow the MVC pattern and yes must write test cases using JUnit.Eclipse also have plugin for JUnit and it is providing code refactoring facility too so that will reduce your some work.And always remember that writing a code is not important the main thing is to write clean code.So now give comments everywhere so that not only you but any other person read the code then while reading the code he must feel that he is reading an essay.
Refactor the low-hanging fruit. Nibble away at the easy bits, and as you do that, the harder bits will begin to be a little easier. When there aren't any bits left to refactor, you're done.
The refactorings you'll probably find most useful are Rename Method (and even more trivial Renamings like Field, Variable, and Parameter), Extract Method, and Extract Class. For each refactoring you perform, write the necessary unit tests to make the refactoring safe, and run the entire suite of unit tests after each refactoring. It's tempting - and, let's be honest, pretty safe - to rely on the automated refactorings of your IDE, without the tests - but it's good practice and will be good to have the tests into the future as you add functionality to your project.
You might want to look at Martin Fowler's book Refactoring. This is the book that popularized the term and technique (my thought when taking his course: "I've been doing a lot of this all along, I didn't know it had a name"). A quote from the link:
Refactoring is a controlled technique
for improving the design of an
existing code base. Its essence is
applying a series of small
behavior-preserving transformations,
each of which "too small to be worth
doing". However the cumulative effect
of each of these transformations is
quite significant. By doing them in
small steps you reduce the risk of
introducing errors. You also avoid
having the system broken while you are
carrying out the restructuring - which
allows you to gradually refactor a
system over an extended period of
time.
As others have pointed out, unit tests will allow you to refactor with confidence. And start by reducing code duplication. The book will give you lots of other insights.
Here is a catalog of refactorings.
The correct definition of messy code, is code that hard to maintain and change.
To use more mathematical definition, you can check your code by code metrics tools.
This way, you will keep the code that already good enough, and find very fast, the wrong code.
My experience say, that is very powerful way to improve the quality of your code. (if your tool can show you the result on each build or on realtime)
Throw it away, build it new.

What's a good approach to writing error handling?

I hate writing error condition code. I guess I don't have a good approach to doing it:
Do you write all of your 'functional'
code first then go back in and add
error handling or vice versa?
How stupid do you assume your users
are?
How granular do you make your
exception throws?
Does anyone have sage advice to pass on to make this easier?
A lot of great answers guys, thank you. I had actually gotten more answers about dealing with the user than I thought. I'm actually more interested in error handling on the back end, dealing with database connection failures and potential effects on the front end, etc. Keep them coming!
I can answer one question: You don't need to assume your users are "stupid", you need to help them to use your application. Show nice prompts for things, validate data and explain why, so it's obvious to them, don't crash in their face if you can't handle what they've done (or more specifically, what you've let them do), show a nice page explaining what they can do instead, and so on.
Treat them with respect, and don't assume they know everything about your system, you are here to help them.
In respect to the first part; I generally write most error-handling at the time, and add a little bit back in later.
I generally don't throw that many exceptions.
Assume your users don't know anything and will break your system any way that it can possibly be broken.
Then write your error handling code accordingly.
First, and foremost, be clear to the user on what you expect. Second, test the input to verify it contains data within the boundaries you expect.
Prime example, I had a form with an email field. We weren't immediately using that data so we didn't put any checking on it. Result: about 1% of the users put in their home address. The field was labeled "Email Address" Apparently the users were just reading the second word and ignoring the first.
The fix was to change the label to simply say "Email" and then test the input. For kicks we went ahead and logged what the users were initially typing in that field just to see if the label change helped. It did.
Also, as a general practice, your functions should test the inputs to verify they contain the data you expect. Use asserts or their equivalent in your language of choice.
When i code, there will be some exceptions which i will expect, i.e. a file may be missing, or some xml serialisation may fail. Those exceptions i know will happen ahead of time, and i can put in handling for them.
There is a lot you cannot anticipate though, and nor should you try to. Put in a global error handler and logger, so that ultimately everything gets caught and logged. Then as your testers and/or users find situations that cause exceptions (i.e. bad input) then you can decide whether you want to put further handling in specifically for it, or maybe leave it as it was.
Summary: validate your input, but don't try to gaze into the crystal ball too much, as you will never anticipate every issue that the user may come up with. Have a global handler and logger, and then refine as necessary.
I have two words for you: Defensive Coding
You have to assume your users are incredibly stupid. Someone will always find a way to give you input that you thought would never happen.
I try to make my exception throws as granular as possible to provide the best feedback when something goes wrong. If you lump everything together, you can't tell which error case caused the problem.
I usually try to handle error cases first (before getting functional code), but that's not necessarily a best practice.
Someone has already mentioned defensive programming. A few thoughts from a user experience perspective, though.
If the user's input is invalid, either (a) correct it if you're reasonably sure you knew what they meant or (b) display a message in line that tells them what corrective action they should take.
Avoid messages like, "FATAL SYSTEM ERROR CODE 02382981." Users (a) don't know what this means, even if you do, and (b) are intimidated and put off by seeing things like this.
Avoid pop-up messages for every—freaking—possible—error you can come up with. You shouldn't disrupt user flow unless you absolutely need them to resolve a problem before they can do anything else.
Log, log, log. When you show an error message to the user, put relevant information that might help you debug in either (a) a log file or (b) a database, depending on the type of application you're creating. This will ease the effort of hunting down information about the error without making the user cry.
Once you identify what your users should and should not be able to do, you'll be able to effectively write error handling code. You can make this easier on yourself with helper methods/classes.
In terms of your question about writing handling before/after/during business logic, think about it this way: if you're making 400,000 sandwiches, it might be faster to add all the mustard at the same time, but it's probably also a lot more boring than making each sandwich individually. Who knows, though, maybe you really like the smell of mustard...

The best way to familiarize yourself with an inherited codebase

Stacker Nobody asked about the most shocking thing new programmers find as they enter the field.
Very high on the list, is the impact of inheriting a codebase with which one must rapidly become acquainted. It can be quite a shock to suddenly find yourself charged with maintaining N lines of code that has been clobbered together for who knows how long, and to have a short time in which to start contributing to it.
How do you efficiently absorb all this new data? What eases this transition? Is the only real solution to have already contributed to enough open-source projects that the shock wears off?
This also applies to veteran programmers. What techniques do you use to ease the transition into a new codebase?
I added the Community-Building tag to this because I'd also like to hear some war-stories about these transitions. Feel free to share how you handled a particularly stressful learning curve.
Pencil & Notebook ( don't get distracted trying to create a unrequested solution)
Make notes as you go and take an hour every monday to read thru and arrange the notes from previous weeks
with large codebases first impressions can be deceiving and issues tend to rearrange themselves rapidly while you are familiarizing yourself.
Remember the issues from your last work environment aren't necessarily valid or germane in your new environment. Beware of preconceived notions.
The notes/observations you make will help you learn quickly what questions to ask and of whom.
Hopefully you've been gathering the names of all the official (and unofficial) stakeholders.
One of the best ways to familiarize yourself with inherited code is to get your hands dirty. Start with fixing a few simple bugs and work your way into more complex ones. That will warm you up to the code better than trying to systematically review the code.
If there's a requirements or functional specification document (which is hopefully up-to-date), you must read it.
If there's a high-level or detailed design document (which is hopefully up-to-date), you probably should read it.
Another good way is to arrange a "transfer of information" session with the people who are familiar with the code, where they provide a presentation of the high level design and also do a walk-through of important/tricky parts of the code.
Write unit tests. You'll find the warts quicker, and you'll be more confident when the time comes to change the code.
Try to understand the business logic behind the code. Once you know why the code was written in the first place and what it is supposed to do, you can start reading through it, or as someone said, prolly fixing a few bugs here and there
My steps would be:
1.) Setup a source insight( or any good source code browser you use) workspace/project with all the source, header files, in the code base. Browsly at a higher level from the top most function(main) to lowermost function. During this code browsing, keep making notes on a paper/or a word document tracing the flow of the function calls. Do not get into function implementation nitti-gritties in this step, keep that for a later iterations. In this step keep track of what arguments are passed on to functions, return values, how the arguments that are passed to functions are initialized how the value of those arguments set modified, how the return values are used ?
2.) After one iteration of step 1.) after which you have some level of code and data structures used in the code base, setup a MSVC (or any other relevant compiler project according to the programming language of the code base), compile the code, execute with a valid test case, and single step through the code again from main till the last level of function. In between the function calls keep moting the values of variables passed, returned, various code paths taken, various code paths avoided, etc.
3.) Keep repeating 1.) and 2.) in iteratively till you are comfortable up to a point that you can change some code/add some code/find a bug in exisitng code/fix the bug!
-AD
I don't know about this being "the best way", but something I did at a recent job was to write a code spider/parser (in Ruby) that went through and built a call tree (and a reverse call tree) which I could later query. This was slightly non-trivial because we had PHP which called Perl which called SQL functions/procedures. Any other code-crawling tools would help in a similar fashion (i.e. javadoc, rdoc, perldoc, Doxygen etc.).
Reading any unit tests or specs can be quite enlightening.
Documenting things helps (either for yourself, or for other teammates, current and future). Read any existing documentation.
Of course, don't underestimate the power of simply asking a fellow teammate (or your boss!) questions. Early on, I asked as often as necessary "do we have a function/script/foo that does X?"
Go over the core libraries and read the function declarations. If it's C/C++, this means only the headers. Document whatever you don't understand.
The last time I did this, one of the comments I inserted was "This class is never used".
Do try to understand the code by fixing bugs in it. Do correct or maintain documentation. Don't modify comments in the code itself, that risks introducing new bugs.
In our line of work, generally speaking we do no changes to production code without good reason. This includes cosmetic changes; even these can introduce bugs.
No matter how disgusting a section of code seems, don't be tempted to rewrite it unless you have a bugfix or other change to do. If you spot a bug (or possible bug) when reading the code trying to learn it, record the bug for later triage, but don't attempt to fix it.
Another Procedure...
After reading Andy Hunt's "Pragmatic Thinking and Learning - Refactor Your Wetware" (which doesn't address this directly), I picked up a few tips that may be worth mentioning:
Observe Behavior:
If there's a UI, all the better. Use the app and get a mental map of relationships (e.g. links, modals, etc). Look at HTTP request if it helps, but don't put too much emphasis on it -- you just want a light, friendly acquaintance with app.
Acknowledge the Folder Structure:
Once again, this is light. Just see what belongs where, and hope that the structure is semantic enough -- you can always get some top-level information from here.
Analyze Call-Stacks, Top-Down:
Go through and list on paper or some other medium, but try not to type it -- this gets different parts of your brain engaged (build it out of Legos if you have to) -- function-calls, Objects, and variables that are closest to top-level first. Look at constants and modules, make sure you don't dive into fine-grained features if you can help it.
MindMap It!:
Maybe the most important step. Create a very rough draft mapping of your current understanding of the code. Make sure you run through the mindmap quickly. This allows an even spread of different parts of your brain to (mostly R-Mode) to have a say in the map.
Create clouds, boxes, etc. Wherever you initially think they should go on the paper. Feel free to denote boxes with syntactic symbols (e.g. 'F'-Function, 'f'-closure, 'C'-Constant, 'V'-Global Var, 'v'-low-level var, etc). Use arrows: Incoming array for arguments, Outgoing for returns, or what comes more naturally to you.
Start drawing connections to denote relationships. Its ok if it looks messy - this is a first draft.
Make a quick rough revision. Its its too hard to read, do another quick organization of it, but don't do more than one revision.
Open the Debugger:
Validate or invalidate any notions you had after the mapping. Track variables, arguments, returns, etc.
Track HTTP requests etc to get an idea of where the data is coming from. Look at the headers themselves but don't dive into the details of the request body.
MindMap Again!:
Now you should have a decent idea of most of the top-level functionality.
Create a new MindMap that has anything you missed in the first one. You can take more time with this one and even add some relatively small details -- but don't be afraid of what previous notions they may conflict with.
Compare this map with your last one and eliminate any question you had before, jot down new questions, and jot down conflicting perspectives.
Revise this map if its too hazy. Revise as much as you want, but keep revisions to a minimum.
Pretend Its Not Code:
If you can put it into mechanical terms, do so. The most important part of this is to come up with a metaphor for the app's behavior and/or smaller parts of the code. Think of ridiculous things, seriously. If it was an animal, a monster, a star, a robot. What kind would it be. If it was in Star Trek, what would they use it for. Think of many things to weigh it against.
Synthesis over Analysis:
Now you want to see not 'what' but 'how'. Any low-level parts that through you for a loop could be taken out and put into a sterile environment (you control its inputs). What sort of outputs are you getting. Is the system more complex than you originally thought? Simpler? Does it need improvements?
Contribute Something, Dude!:
Write a test, fix a bug, comment it, abstract it. You should have enough ability to start making minor contributions and FAILING IS OK :)! Note on any changes you made in commits, chat, email. If you did something dastardly, you guys can catch it before it goes to production -- if something is wrong, its a great way to get a teammate to clear things up for you. Usually listening to a teammate talk will clear a lot up that made your MindMaps clash.
In a nutshell, the most important thing to do is use a top-down fashion of getting as many different parts of your brain engaged as possible. It may even help to close your laptop and face your seat out the window if possible. Studies have shown that enforcing a deadline creates a "Pressure Hangover" for ~2.5 days after the deadline, which is why deadlines are often best to have on a Friday. So, BE RELAXED, THERE'S NO TIMECRUNCH, AND NOW PROVIDE YOURSELF WITH AN ENVIRONMENT THAT'S SAFE TO FAIL IN. Most of this can be fairly rushed through until you get down to details. Make sure that you don't bypass understanding of high-level topics.
Hope this helps you as well :)
All really good answers here. Just wanted to add few more things:
One can pair architectural understanding with flash cards and re-visiting those can solidify understanding. I find questions such as "Which part of code does X functionality ?", where X could be a useful functionality in your code base.
I also like to open a buffer in emacs and start re-writing some parts of the code base that I want to familiarize myself with and add my own comments etc.
One thing vi and emacs users can do is use tags. Tags are contained in a file ( usually called TAGS ). You generate one or more tags files by a command ( etags for emacs vtags for vi ). Then we you edit source code and you see a confusing function or variable you load the tags file and it will take you to where the function is declared ( not perfect by good enough ). I've actually written some macros that let you navigate source using Alt-cursor,
sort of like popd and pushd in many flavors of UNIX.
BubbaT
The first thing I do before going down into code is to use the application (as several different users, if necessary) to understand all the functionalities and see how they connect (how information flows inside the application).
After that I examine the framework in which the application was built, so that I can make a direct relationship between all the interfaces I have just seen with some View or UI code.
Then I look at the database and any database commands handling layer (if applicable), to understand how that information (which users manipulate) is stored and how it goes to and comes from the application
Finally, after learning where data comes from and how it is displayed I look at the business logic layer to see how data gets transformed.
I believe every application architecture can de divided like this and knowning the overall function (a who is who in your application) might be beneficial before really debugging it or adding new stuff - that is, if you have enough time to do so.
And yes, it also helps a lot to talk with someone who developed the current version of the software. However, if he/she is going to leave the company soon, keep a note on his/her wish list (what they wanted to do for the project but were unable to because of budget contraints).
create documentation for each thing you figured out from the codebase.
find out how it works by exprimentation - changing a few lines here and there and see what happens.
use geany as it speeds up the searching of commonly used variables and functions in the program and adds it to autocomplete.
find out if you can contact the orignal developers of the code base, through facebook or through googling for them.
find out the original purpose of the code and see if the code still fits that purpose or should be rewritten from scratch, in fulfillment of the intended purpose.
find out what frameworks did the code use, what editors did they use to produce the code.
the easiest way to deduce how a code works is by actually replicating how a certain part would have been done by you and rechecking the code if there is such a part.
it's reverse engineering - figuring out something by just trying to reengineer the solution.
most computer programmers have experience in coding, and there are certain patterns that you could look up if that's present in the code.
there are two types of code, object oriented and structurally oriented.
if you know how to do both, you're good to go, but if you aren't familiar with one or the other, you'd have to relearn how to program in that fashion to understand why it was coded that way.
in objected oriented code, you can easily create diagrams documenting the behaviors and methods of each object class.
if it's structurally oriented, meaning by function, create a functions list documenting what each function does and where it appears in the code..
i haven't done either of the above myself, as i'm a web developer it is relatively easy to figure out starting from index.php to the rest of the other pages how something works.
goodluck.