How to partition a problem into smaller understandable portions? - partitioning

I'm not sure if it's possible to give general advice on this topic, but please try. It's hard to explain my case because it's too complex to explain. And that's exactly the problem.
I seem to constantly stumble on a situation where I try to design some part of my project, but it has so many things to take into consideration that I'm unable to get a grasp of it.
Are there any general tips or advice on how to look at my system in smaller pieces at a time? How to find smaller portions that could be designed separately on their own?

Create a glossary.
In other words, identify the terms that are meaningful to the project domain — not from the programmer's point of view, but from a user's, who is familiar with the subject matter.
Then define the terms as precisely and discretely as you can. A good definition in this form can serve as a kind of pseudocode.
Since you have not identified even the domain of your problem, I'll choose a random example. In a civilian personnel system, you might have terms like:
billet: a term of service (from start date to end date) at a particular grade and step
employee: a series of billets associated with a particular SSN
grade and step: row and column in the federal general schedule
And so on. This isn't to identify functional units, as it sounds like you are trying to do, but it's a good preparatory step before doing so, so that you can express your functional steps in well-defined terms.

Your key goals are:
High cohesion: Code (methods, fields, classes) within one piece/module/partition should interact intensively; it should make sense for these elements to know about each other. If you find that some of them don't interact much with the rest, they probably belong somwhere else or should form their own partition. If you find code outside interacting intensively with the partition and knowing too much about its inner workings, it probably belongs inside. The typical example is found in OO code written in procedural style, with "dumb" data objects and "manager" code that operates on them but should really be part of the data objects.
Loose coupling: Interaction between pieces/modules/partitions should only happen through narrow, well-defined, well-documented APIs. Try to identify such APIs and see what code is needed to implement them and what code will use them.

It's useful to approach problem decomposition both top-down and bottom-up.
If you're having trouble splitting a big problem into two or more smaller problems, try to think of the smallest possible problems that will need to be solved. Once those are handled, you may start to see ways to combine them into larger problems as you approach your original large problem.

When I find myself copying and pasting chunks of code with minimal adjustments I realize that's a "partition" and then create a class, method, function, or whatever.
Actually, the whole object oriented approach is what it's all about. Try thinking of your application as tangible things that do stuff. Write pseudo code describing what the things are and what they do, I find lots of "partitions" this way.

Here's a try, kind of wild guess.
People usually underestimate how long it will take them to do the work. If your project is large, then most likely you'll need several people to work on it, so you can try planning with that in mind. Now a person can be expected to hold just one area in the head, so you'll need to explain to him exactly what kind of task he's supposed to do.
So I'd say you should try to write a job description that should encompass as much as possible for one person to seriously concentrate on. Repeat, until you have broken your project into parts you wanted to. As a benefit, you're ready to assemble your team. But if you find out the parts are small, maybe you'll still be able to do it yourself.

Related

Write programs that do one thing and do it well

I can grasp the part "do one thing" via encapsulation, Dependency Injection, Principle of Least Knowledge, and You Ain't Gonna Need It; but how do I understand the second part "do it well?"
An example given was the notion of completeness, given in the same YAGNI article:
for example, among features which allow adding items, deleting items, or modifying items, completeness could be used to also recommend "renaming items".
However, I found reasoning like that could easily be abused into feature creep, thus violating the "do one thing" part.
So, what is a litmus test for seeing rather a feature belongs to the "do it well" category (hence, include it into the function/class/program) or to the other "do one thing" category (hence, exclude it)?
The first part, "do one thing," is best understood via UNIX's ls command as a counterexample for its inclusion of excessive number of flags for formatting its output, which should have been completely delegated to another external program. But I don't have a good example to see the second part "do it well."
What is a good example where removing any further feature would make it not "do it well?"
I see "Do It Well" as being as much about quality of implementation of a function than about the completeness of a set functions (in your example having rename, as well as create and delete).
Do It Well manifests in many ways, some ways of thinking:
Behaviour in response to "special" inputs. Example, calculating the mean of some integers:
int mean(int[] values) { ... }
what does this do if the array has zero elements? If the items total more than MAX_INT?
Performance Characteristics. Has sufficient attention been given to behaviour as the data volumes increase?
Dependency Failures. If our implementation depends upon other modules or infrastructure what happens when these fail. Example: File System Full, Database Down?
Concerning feature creep itself, I think you're correct to indentify a tension here. One thing you might consider: you don't need to implment every feature providing that it's pretty obvious that a feature can be added easily without a complete rewrite.
The whole purpose of this advice is to make you favor quality over quantity.
The concept of one thing is subjective and depends on granularity. Would you say that a spreadsheet application does more than one thing if it can also print, or is that part of that one thing?
The point is that you should make sure that any feature, and the application itself, is done and will delight customers before you scramble to add new features.
I think your question points out the fundamentally organic nature of feature creep, and in understanding that nature, you will be empowered to meditate on the larger question.
Think of it like a garden: If you plant one thing and plant it well, say, a chrysanthemum, you aren't done at simply planting the seed. In fact you'll need to ensure that the soil is well tended, that the area is sufficiently protected, that the season is right, etc.
As your chrysanthemum (your one thing) grows, so too will other competitive plants - some that need to be weeded out and others that may actually compliment the original one thing. In fact, these other organisms may in some cases prove vital for the survival of your one thing.
Like those features that YAGN, a bit of vigilance is required to determine which weeds represent feature creep and which represent vital and complimentary functions.
Regardless, having done it well means simply that your chrysanthemum is hearty, healthy, and on-time. :-)
I would say an email program without the ability to add attachments would be a good example.
This may sound like an odd example, but I'd say dropbox is a good, albeit complex example.
Its managed to beat off a swathe of similar competing apps, through a dedication to simplification and a lack of feature creep tha,t as you mentioned, would violate the 'do one thing' principle. The ap lets you store documents in a folder that you can access anywhere, and that's about the limit of it. They drilled down to the core problem, and solved it in a way that works perfectly well in 90+% of cases.
Its hard to put a hard and fast rule to it, but I'd say that catering to around the 90% majority of use cases and ignoring 'fringe requirements' is the best way to stick to this rule.
I'd guess 90+% of ls use is with no arguments or maybe two or three of the most popular. The 'do it well' principle should focus on what the majority of users need, instead of catering for power users or fringe cases, as ls does with its plethora of options.
This is what dropbox does successfully and why it is pretty well agreed upon as an example of good application design.

How do you refactor a large messy codebase?

I have a big mess of code. Admittedly, I wrote it myself - a year ago. It's not well commented but it's not very complicated either, so I can understand it -- just not well enough to know where to start as far as refactoring it.
I violated every rule that I have read about over the past year. There are classes with multiple responsibilities, there are indirect accesses (I forget the term - something like foo.bar.doSomething()), and like I said it is not well commented. On top of that, it's the beginnings of a game, so the graphics is coupled with the data, or the places where I tried to decouple graphics and data, I made the data public in order for the graphics to be able to access the data it needs...
It's a huge mess! Where do I start? How would you start on something like this?
My current approach is to take variables and switch them to private and then refactor the pieces that break, but that doesn't seem to be enough. Please suggest other strategies for wading through this mess and turning it into something clean so that I can continue where I left off!
Update two days later: I have been drawing out UML-like diagrams of my classes, and catching some of the "Low Hanging Fruit" along the way. I've even found some bits of code that were the beginnings of new features, but as I'm trying to slim everything down, I've been able to delete those bits and make the project feel cleaner. I'm probably going to refactor as much as possible before rigging my test cases (but only the things that are 100% certain not to impact the functionality, of course!), so that I won't have to refactor test cases as I change functionality. (do you think I'm doing it right or would it, in your opinion, be easier for me to suck it up and write the tests first?)
Please vote for the best answer so that I can mark it fairly! Feel free to add your own answer to the bunch as well, there's still room for you! I'll give it another day or so and then probably mark the highest-voted answer as accepted.
Thanks to everyone who has responded so far!
June 25, 2010: I discovered a blog post which directly answers this question from someone who seems to have a pretty good grasp of programming: (or maybe not, if you read his article :) )
To that end, I do four things when I
need to refactor code:
Determine what the purpose of the code was
Draw UML and action diagrams of the classes involved
Shop around for the right design patterns
Determine clearer names for the current classes and methods
Pick yourself up a copy of Martin Fowler's Refactoring. It has some good advice on ways to break down your refactoring problem. About 75% of the book is little cookbook-style refactoring steps you can do. It also advocates automated unit tests that you can run after each step to prove your code still works.
As for a place to start, I would sit down and draw out a high-level architecture of your program. You don't have to get fancy with detailed UML models, but some basic UML is not a bad idea. You need a big picture idea of how the major pieces fit together so you can visually see where your decoupling is going to happen. Just a page or two of some basic block diagrams will help with the overwhelming feeling you have right now.
Without some sort of high level spec or design, you just risk getting lost again and ending up with another unmaintainable mess.
If you need to start from scratch, remember that you never truly start from scratch. You have some code and the knowledge you gained from your first time. But sometimes it does help to start with a blank project and pull things in as you go, rather than put out fires in a messy code base. Just remember not to completely throw out the old, use it for its good parts and pull them in as you go.
What was most important for me on different occasions were unit tests: I took a few days to write tests for the old code and then I was free to refactor with confidence. How exactly is a different question, but having the tests made it possible for me to make real, substantial changes to the code.
I'll second everyone's recommendations for Fowler's Refactoring, but in your specific case you may want to look at Michael Feathers' Working Effectively with Legacy Code, which is really perfect for your situation.
Feathers talks about Characterization Tests, which are unit tests not to assert known behaviour of the system but to explore and define the existing (unclear) behaviour -- in the case where you've written your own legacy code, and fixing it yourself, this may not be so important, but if your design is sloppy then it's quite possible there are parts of the code that work by 'magic' and their behaviour isn't clear, even to you -- in that case, characterization tests will help.
One great part of the book is the discussion about finding (or creating) seams in your codebase -- seams are natural 'fault lines', if you like, where you can break into the existing system to start testing it, and pulling it towards a better design. Hard to explain but well worth a read.
There's a brief paper where Feathers fleshes out some of the concepts from the book, but it really is well worth hunting down the whole thing. It's one of my favourites.
Just an additional refactoring that is more important than you think: Name things correctly!
This goes for any variable name and method name. If the name does not accurately reflect what the thing is used for, then rename it to something more accurate. This might require several iterations. If you cannot find a short, and entirely accurate name, then that item does too much and you have an excellent candidate for a code snippet that needs to be split. The names also clearly indicate where the cuts are to be made.
Also, document your stuff. Whenever the answer to WHY? is not clearly conveyed by the answer to HOW? (being the code) you will need to add some documentation. Capturing design decisions is probably the most important task as it is very hard to do in code.
You could always start from "scratch". That doesn't mean scrap it and start from nothing, but try to rethink high-level things from the beginning, since you seem to have learned a lot since the last time you worked on it.
Start from a higher level, and as you build the scaffolding of your new and improved structure, take all the code you can reuse, which will probably be more than you think if you're willing to read through it and make some small changes.
When you're making the changes, be sure to be strict with yourself about following all the good practices you now know, because you will really thank yourself later.
It can be surprisingly refreshing to properly re-make program to do exactly what it did before, only more "cleanly". ;)
As others have mentioned as well, unit-tests are your best friend! They help you ensure that your refactoring works, and if you're starting from "scratch", it's the perfect time to write them.
You're in a much better position than many people facing this problem in that you understand what the code is supposed to do.
Taking variables out of a shared scope, as you're doing, is a great start, in that you're partitioning responsibilities. Ultimately you want each class to express a single responsibility. A few other things you might look at:
Easy targets for refactoring are code that's duplicated in lots of places and long methods.
If you're managing application state through statically initialized singletons or worse, a global state that everything is talking to, consider moving it to a managed initialization system (i.e. a dependency injection framework like spring or guice) or at least make sure that the initialization isn't entangled with the rest of the code.
Centralize and standardize how you're accessing outside resources, especially if you've got things like file locations or urls hardcoded.
Buy an IDE that has good refactoring support. I think IntelliJ is the best, but Eclipse has it now, too.
The unit test idea is key as well. You will want to have a suite of large, overall transactions that will give you the overall behavior of the code.
Once you have those, start creating unit tests for classes and smaller packages. Write the tests to demonstrate proper behavior, make your changes, and re-run the tests to demonstrate that you haven't broken everything.
Track code coverage as you go. You'll want to work it up to 70% or better. For the classes you change, you'll want those to be 70% or better before you make your changes.
Build up that safety net over time and you'll be able to refactor with some confidence.
very slowly :D
No seriously... take it one step at a time. For instance, refactor something only if it affects or helps you write the current bug/feature that you are working on right now and do no more than that. And before you refactor make darn sure that you have some kind of automated test in place that gets run on each build that will actually test what you are writing/refactoring. Even if you don't have unit tests, it is never too late to start adding them for all new and modified code that is being written. Over time, your code base will get better in small increments daily or weekly instead of worse - all without you making monumental heaps of changes.
In my personal opinion and experience, it's not worth it to just refactor a (legacy) codebase en masse for the sake of refactoring. In those cases, it's best to just start from scratch and do it right all over again (and very rarely are you afforded the opportunity to do such a thing). Hence, just refactoring incremental is the way to go.
For Java code, my favorite first step is to run Findbugs and then remove all the dead stores, un-used fields, unreachable catch blocks, unused private methods and likely bugs.
Next I run CPD to look for evidence of cut-copy-paste code.
It isn't unusual to be able to reduce the code base by 5% by doing this. It also saves you from refactoring code that is never used.
I think you should use Eclipse as a IDE because it is having many plugins and free of cost.You should now follow the MVC pattern and yes must write test cases using JUnit.Eclipse also have plugin for JUnit and it is providing code refactoring facility too so that will reduce your some work.And always remember that writing a code is not important the main thing is to write clean code.So now give comments everywhere so that not only you but any other person read the code then while reading the code he must feel that he is reading an essay.
Refactor the low-hanging fruit. Nibble away at the easy bits, and as you do that, the harder bits will begin to be a little easier. When there aren't any bits left to refactor, you're done.
The refactorings you'll probably find most useful are Rename Method (and even more trivial Renamings like Field, Variable, and Parameter), Extract Method, and Extract Class. For each refactoring you perform, write the necessary unit tests to make the refactoring safe, and run the entire suite of unit tests after each refactoring. It's tempting - and, let's be honest, pretty safe - to rely on the automated refactorings of your IDE, without the tests - but it's good practice and will be good to have the tests into the future as you add functionality to your project.
You might want to look at Martin Fowler's book Refactoring. This is the book that popularized the term and technique (my thought when taking his course: "I've been doing a lot of this all along, I didn't know it had a name"). A quote from the link:
Refactoring is a controlled technique
for improving the design of an
existing code base. Its essence is
applying a series of small
behavior-preserving transformations,
each of which "too small to be worth
doing". However the cumulative effect
of each of these transformations is
quite significant. By doing them in
small steps you reduce the risk of
introducing errors. You also avoid
having the system broken while you are
carrying out the restructuring - which
allows you to gradually refactor a
system over an extended period of
time.
As others have pointed out, unit tests will allow you to refactor with confidence. And start by reducing code duplication. The book will give you lots of other insights.
Here is a catalog of refactorings.
The correct definition of messy code, is code that hard to maintain and change.
To use more mathematical definition, you can check your code by code metrics tools.
This way, you will keep the code that already good enough, and find very fast, the wrong code.
My experience say, that is very powerful way to improve the quality of your code. (if your tool can show you the result on each build or on realtime)
Throw it away, build it new.

Not letting users make mistakes vs. giving them flexibility

I'm working on a product which is meant to be simple to use and simple to set up, the competition largely requiring a long set up period and in some cases going as far as a bespoke solution for each customer. One part of our application is now expanding based on customer requests and it is looking like we'll need to make it very flexible so each customer can have a lot of control over how it behaves for them. The problem being that I don't want to make the system too configurable, as I believe this then makes it more complex to learn and to work with. I'm also concerned it opens the door to someone messing things up for themselves, kind of like handing them a gun, although I'm not actually pointing it at their foot for them.
Has anyone else faced a similar dilemma of putting power in users hands? How did you solve it? and what was the result?
I don't normally like to subscribe to the idea that all users are stupid, but there is a rule which can still be applied:
If you give them the opportunity, they WILL break it
Now it is up to you whether or not to give them the ability to do potentially dumb things. Or better yet, develop it so that when they do do the stupid voodoo that they do, it can be reverted or recovered from error state gracefully.
I highly recommend you read Joel's Controlling Your Environment Makes You Happy, which can be described as a treatise on user interface design but is really about usability with a healthy dab of psychology thrown in.
The section I'm referring to is Choices:
Every time you provide an option,
you're asking the user to make a
decision.
This is something I strongly agree with. Many developers, product managers and so on take the easy route and instead of figuring out what users actually need, they just give them a choice. You see this in enterprise bloatware like Clearcase or PVCS where there are so many options--90% of which you'd never change--indicating the designers have tried to make it all things to all men rather than doing one or two things exceptionally well.
Instead it just does lots of things badly.
Keep it simple, follow conventions, don't overwhelm the user with pointless and unnecessary choices and make the software behave like a normal user would expect. That alone would set you apart from an awful lot of other products.
Personally I like the TurboTax model (http://turbotax.intuit.com/). When creating a tax return, I get a simple, tell-me-like-I'm-five wizard that takes me step-by-step through the process, but I can step outside the process at any time and use more advanced features, returning to the process later.
Make it easy and simple and uncluttered for your user to do what they're going to do 80% of the time, but give them the power to deliberately step outside of the norm.
Interesting timing for your question. In the U.S. this is Income Tax week. Filling out the ol' 1040 and associated subforms should give us some sympathy for what users endure.
Lessons I take away are:
Only ask questions that relate to the user domain; avoid questions relating to the software system; and if you can derive the answer or suggest a most likely answer, do so.
Put related questions together (as long as they are normally entered by the same person using data most likely available at the same place and time, which is the definition of related for these purposes).
Make it support incremental input. It should be easy to enter the data they have, and defer completing it when the rest is available.
Show status validity and completeness. Make it clear and obvious how far they are to having validatable data.
Make it interruptable. Make sure it's possible to interrupt the process, leave the application, come back, and resume where they left off.
Yup, it's harder to program. Embrace it.
There are at least two ways to build a good software product:
Focus on a narrow set of functionality, and implement that functionality very well.
Design your system to be customizable (ideally, through scripting.) If you do the base system right, it will be easy to provide the basic, no options, just-do-what-I-want functionality on top of the customization layer.
Unfortunately, there are many more ways to create a bad software product.
Your questions implies that you can either provide a flexible solution OR make it foolproof.
I wouldn't put it like that. To me this is rather a matter of user expectations and the question in the first place would be:
How can I meet all important user expectations (even if they conflict with each other) without corrupting the application?
For instance a web application which has a menu, breadcrumb navigation, a site map and a search offers together with the inline links five different ways to find what you're looking for and how to go there.
That way most users can find fast and easily the functionality they are expecting and therefore the need for an extensive documentation might actually decrease.
So the answer might be to offer several different carefully chosen ways to solve one specific task, while each of them can be streamlined independent to avoid user mistakes.
The answer with this lies in who your end-users are. I used to write software that got used by professional sports coaches. While these guys were definitely good at what they did, they were hardly proficient at computer use, so our configurability was kept to a minimum (at least as far as what could be done in the GUI).
On the other hand, if you're dealing with power users, adding options is usually not a bad thing as long as they aren't intrusive.
It's all about who's going to be getting them.
Read Jeff Atwood's Training Your Users. It's a great article with some very useful links.
I like the approach of Firefox towards this. The basic options are accessible in the option menu, all the rest is under about:config. Thus you have an easy interface and an incredible flexibility if you need it.
I've had great success, and been happiest as an user, when using sensible defaults. In other words, make the most common use case easy (or even better, free), but give users the ability to step outside of that use case when the situation calls for it.

What's the best way to become familiar with a large codebase? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Joining an existing team with a large codebase already in place can be daunting. What's the best approach;
Broad; try to get a general overview of how everything links together, from the code
Narrow; focus on small sections of code at a time, understanding how they work fully
Pick a feature to develop and learn as you go along
Try to gain insight from class diagrams and uml, if available (and up to date)
Something else entirely?
I'm working on what is currently an approx 20k line C++ app & library (Edit: small in the grand scheme of things!). In industry I imagine you'd get an introduction by an experienced programmer. However if this is not the case, what can you do to start adding value as quickly as possible?
--
Summary of answers:
Step through code in debug mode to see how it works
Pair up with someone more familiar with the code base than you, taking turns to be the person coding and the person watching/discussing. Rotate partners amongst team members so knowledge gets spread around.
Write unit tests. Start with an assertion of how you think code will work. If it turns out as you expected, you've probably understood the code. If not, you've got a puzzle to solve and or an enquiry to make. (Thanks Donal, this is a great answer)
Go through existing unit tests for functional code, in a similar fashion to above
Read UML, Doxygen generated class diagrams and other documentation to get a broad feel of the code.
Make small edits or bug fixes, then gradually build up
Keep notes, and don't jump in and start developing; it's more valuable to spend time understanding than to generate messy or inappropriate code.
this post is a partial duplicate of the-best-way-to-familiarize-yourself-with-an-inherited-codebase
Start with some small task if possible, debug the code around your problem.
Stepping through code in debug mode is the easiest way to learn how something works.
Another option is to write tests for the features you're interested in. Setting up the test harness is a good way of establishing what dependencies the system has and where its state resides. Each test starts with an assertion about the way you think the system should work. If it turns out to work that way, you've achieved something and you've got some working sample code to reproduce it. If it doesn't work that way, you've got a puzzle to solve and a line of enquiry to follow.
One thing that I usually suggest to people that has not yet been mentioned is that it is important to become a competent user of the existing code base before you can be a developer. When new developers come into our large software project, I suggest that they spend time becoming expert users before diving in trying to work on the code.
Maybe that's obvious, but I have seen a lot of people try to jump into the code too quickly because they are eager to start making progress.
This is quite dependent on what sort of learner and what sort of programmer you are, but:
Broad first - you need an idea of scope and size. This might include skimming docs/uml if they're good. If it's a long term project and you're going to need a full understanding of everything, I might actually read the docs properly. Again, if they're good.
Narrow - pick something manageable and try to understand it. Get a "taste" for the code.
Pick a feature - possibly a different one to the one you just looked at if you're feeling confident, and start making some small changes.
Iterate - assess how well things have gone and see if you could benefit from repeating an early step in more depth.
Pairing with strict rotation.
If possible, while going through the documentation/codebase, try to employ pairing with strict rotation. Meaning, two of you sit together for a fixed period of time (say, a 2 hour session), then you switch pairs, one person will continue working on that task while the other moves to another task with another partner.
In pairs you'll both pick up a piece of knowledge, which can then be fed to other members of the team when the rotation occurs. What's good about this also, is that when a new pair is brought together, the one who worked on the task (in this case, investigating the code) can then summarise and explain the concepts in a more easily understood way. As time progresses everyone should be at a similar level of understanding, and hopefully avoid the "Oh, only John knows that bit of the code" syndrome.
From what I can tell about your scenario, you have a good number for this (3 pairs), however, if you're distributed, or not working to the same timescale, it's unlikely to be possible.
I would suggest running Doxygen on it to get an up-to-date class diagram, then going broad-in for a while. This gives you a quickie big picture that you can use as you get up close and dirty with the code.
I agree that it depends entirely on what type of learner you are. Having said that, I've been at two companies which had very large code-bases to begin with. Typically, I work like this:
If possible, before looking at any of the functional code, I go through unit tests that are already written. These can generally help out quite a lot. If they aren't available, then I do the following.
First, I largely ignore implementation and look only at header files, or just the class interfaces. I try to get an idea of what the purpose of each class is. Second, I go one level deep into the implementation starting with what seems to be the area of most importance. This is hard to gauge, so occasionally I just start at the top and work my way down in the file list. I call this breadth-first learning. After this initial step, I generally go depth-wise through the rest of the code. The initial breadth-first look helps to solidify/fix any ideas I got from the interface level, and then the depth-wise look shows me the patterns that have been used to implement the system, as well as the different design ideas. By depth-first, I mean you basically step through the program using the debugger, stepping into each function to see how it works, and so on. This obviously isn't possible with really large systems, but 20k LOC is not that many. :)
Work with another programmer who is more familiar with the system to develop a new feature or to fix a bug. This is the method that I've seen work out the best.
I think you need to tie this to a particular task. When you have time on your hands, go for whichever approach you are in the mood for.
When you have something that needs to get done, give yourself a narrow focus and get it done.
Get the team to put you on bug fixing for two weeks (if you have two weeks). They'll be happy to get someone to take responsibility for that, and by the end of the period you will have spent so much time problem-solving with the library that you'll probably know it pretty well.
If it has unit tests (I'm betting it doesn't). Start small and make sure the unit tests don't fail. If you stare at the entire codebase at once your eyes will glaze over and you will feel overwhelmed.
If there are no unit tests, you need to focus on the feature that you want. Run the app and look at the results of things that your feature should affect. Then start looking through the code trying to figure out how the app creates the things you want to change. Finally change it and check that the results come out the way you want.
You mentioned it is an app and a library. First change the app and stick to using the library as a user. Then after you learn the library it will be easier to change.
From a top down approach, the app probably has a main loop or a main gui that controls all the action. It is worth understanding the main control flow of the application. It is worth reading the code to give yourself a broad overview of the main flow of the app. If it is a GUI app, creating a paper that shows which screens there are and how to get from one screen to another. If it is a command line app, how the processing is done.
Even in companies it is not unusual to have this approach. Often no one fully understands how an application works. And people don't have time to show you around. They prefer specific questions about specific things so you have to dig in and experiment on your own. Then once you get your specific question you can try to isolate the source of knowledge for that piece of the application and ask it.
Start by understanding the 'problem domain' (is it a payroll system? inventory? real time control or whatever). If you don't understand the jargon the users use, you'll never understand the code.
Then look at the object model; there might already be a diagram or you might have to reverse engineer one (either manually or using a tool as suggested by Doug). At this stage you could also investigate the database (if any), if should follow the object model but it may not, and that's important to know.
Have a look at the change history or bug database, if there's an area that comes up a lot, look into that bit first. This doesn't mean that it's badly written, but that it's the bit everyone uses.
Lastly, keep some notes (I prefer a wiki).
The existing guys can use it to sanity check your assumptions and help you out.
You will need to refer back to it later.
The next new guy on the team will really thank you.
I had a similar situation. I'd say you go like this:
If its a database driven application, start from the database and try to make sense of each table, its fields and then its relation to the other tables.
Once fine with the underlying store, move up to the ORM layer. Those table must have some kind of representation in code.
Once done with that then move on to how and where from these objects are coming from. Interface? what interface? Any validations? What preprocessing takes place on them before they go to the datastore?
This would familiarize you better with the system. Remember that trying to write or understand unit tests is only possible when you know very well what is being tested and why it needs to be tested in only that way.
And in case of a large application that is not driven towards databases, I'd recommend an other approach:
What the main goal of the system?
What are the major components of the system then to solve this problem?
What interactions each of the component has among them? Make a graph that depicts component dependencies. Ask someone already working on it. These componentns must be exchanging something among each other so try to figure out those as well (like IO might be returning File object back to GUI and like)
Once comfortable to this, dive into component that is least dependent among others. Now study how that component is further divided into classes and how they interact wtih each other. This way you've got a hang of a single component in total
Move to the next least dependent component
To the very end, move to the core component that typically would have dependencies on many of the other components which you've already tackled
While looking at the core component, you might be referring back to the components you examined earlier, so dont worry keep working hard!
For the first strategy:
Take the example of this stackoverflow site for instance. Examine the datastore, what is being stored, how being stored, what representations those items have in the code, how an where those are presented on the UI. Where from do they come and what processing takes place on them once they're going back to the datastore.
For the second one
Take the example of a word processor for example. What components are there? IO, UI, Page and like. How these are interacting with each other? Move along as you learn further.
Be relaxed. Written code is someone's mindset, froze logic and thinking style and it would take time to read that mind.
First, if you have team members available who have experience with the code you should arrange for them to do an overview of the code with you. Each team member should provide you with information on their area of expertise. It is usually valuable to get multiple people explaining things, because some will be better at explaining than others and some will have a better understanding than others.
Then, you need to start reading the code for a while without any pressure (a couple of days or a week if your boss will provide that). It often helps to compile/build the project yourself and be able to run the project in debug mode so you can step through the code. Then, start getting your feet wet, fixing small bugs and making small enhancements. You will hopefully soon be ready for a medium-sized project, and later, a big project. Continue to lean on your team-mates as you go - often you can find one in particular who is willing to mentor you.
Don't be too hard on yourself if you struggle - that's normal. It can take a long time, maybe years, to understand a large code base. Actually, it's often the case that even after years there are still some parts of the code that are still a bit scary and opaque. When you get downtime between projects you can dig in to those areas and you'll often find that after a few tries you can figure even those parts out.
Good luck!
You may want to consider looking at source code reverse engineering tools. There are two tools that I know of:
SWAG Kit (Linux only) link
Bauhaus academic commercial
Both tools offer similar feature sets that include static analysis that produces graphs of the relations between modules in the software.
This mostly consists of call graphs and type/class decencies. Viewing this information should give you a good picture of how the parts of the code relate to one another. Using this information, you can dig into the actual source for the parts that you are most interested in and that you need to understand/modify first.
I find that just jumping in to code can be a a bit overwhelming. Try to read as much documentation on the design as possible. This will hopefully explain the purpose and structure of each component. Its best if an existing developer can take you through it but that isn't always possible.
Once you are comfortable with the high level structure of the code, try to fix a bug or two. this will help you get to grips with the actual code.
I like all the answers that say you should use a tool like Doxygen to get a class diagram, and first try to understand the big picture. I totally agree with this.
That said, this largely depends on how well factored the code is to begin with. If its a gigantic mess, it's going to be hard to learn. If its clean, and organized properly, it shouldn't be that bad.
See this answer on how to use test coverage tools to locate the code for a feature of interest, without knowing anything about where that feature is, or how it is spread across many modules.
(shameless marketing ahead)
You should check out nWire. It is an Eclipse plugin for navigating and visualizing large codebases. Many of our customers use it to break-in new developers by printing out visualizations of the major flows.

The best way to familiarize yourself with an inherited codebase

Stacker Nobody asked about the most shocking thing new programmers find as they enter the field.
Very high on the list, is the impact of inheriting a codebase with which one must rapidly become acquainted. It can be quite a shock to suddenly find yourself charged with maintaining N lines of code that has been clobbered together for who knows how long, and to have a short time in which to start contributing to it.
How do you efficiently absorb all this new data? What eases this transition? Is the only real solution to have already contributed to enough open-source projects that the shock wears off?
This also applies to veteran programmers. What techniques do you use to ease the transition into a new codebase?
I added the Community-Building tag to this because I'd also like to hear some war-stories about these transitions. Feel free to share how you handled a particularly stressful learning curve.
Pencil & Notebook ( don't get distracted trying to create a unrequested solution)
Make notes as you go and take an hour every monday to read thru and arrange the notes from previous weeks
with large codebases first impressions can be deceiving and issues tend to rearrange themselves rapidly while you are familiarizing yourself.
Remember the issues from your last work environment aren't necessarily valid or germane in your new environment. Beware of preconceived notions.
The notes/observations you make will help you learn quickly what questions to ask and of whom.
Hopefully you've been gathering the names of all the official (and unofficial) stakeholders.
One of the best ways to familiarize yourself with inherited code is to get your hands dirty. Start with fixing a few simple bugs and work your way into more complex ones. That will warm you up to the code better than trying to systematically review the code.
If there's a requirements or functional specification document (which is hopefully up-to-date), you must read it.
If there's a high-level or detailed design document (which is hopefully up-to-date), you probably should read it.
Another good way is to arrange a "transfer of information" session with the people who are familiar with the code, where they provide a presentation of the high level design and also do a walk-through of important/tricky parts of the code.
Write unit tests. You'll find the warts quicker, and you'll be more confident when the time comes to change the code.
Try to understand the business logic behind the code. Once you know why the code was written in the first place and what it is supposed to do, you can start reading through it, or as someone said, prolly fixing a few bugs here and there
My steps would be:
1.) Setup a source insight( or any good source code browser you use) workspace/project with all the source, header files, in the code base. Browsly at a higher level from the top most function(main) to lowermost function. During this code browsing, keep making notes on a paper/or a word document tracing the flow of the function calls. Do not get into function implementation nitti-gritties in this step, keep that for a later iterations. In this step keep track of what arguments are passed on to functions, return values, how the arguments that are passed to functions are initialized how the value of those arguments set modified, how the return values are used ?
2.) After one iteration of step 1.) after which you have some level of code and data structures used in the code base, setup a MSVC (or any other relevant compiler project according to the programming language of the code base), compile the code, execute with a valid test case, and single step through the code again from main till the last level of function. In between the function calls keep moting the values of variables passed, returned, various code paths taken, various code paths avoided, etc.
3.) Keep repeating 1.) and 2.) in iteratively till you are comfortable up to a point that you can change some code/add some code/find a bug in exisitng code/fix the bug!
-AD
I don't know about this being "the best way", but something I did at a recent job was to write a code spider/parser (in Ruby) that went through and built a call tree (and a reverse call tree) which I could later query. This was slightly non-trivial because we had PHP which called Perl which called SQL functions/procedures. Any other code-crawling tools would help in a similar fashion (i.e. javadoc, rdoc, perldoc, Doxygen etc.).
Reading any unit tests or specs can be quite enlightening.
Documenting things helps (either for yourself, or for other teammates, current and future). Read any existing documentation.
Of course, don't underestimate the power of simply asking a fellow teammate (or your boss!) questions. Early on, I asked as often as necessary "do we have a function/script/foo that does X?"
Go over the core libraries and read the function declarations. If it's C/C++, this means only the headers. Document whatever you don't understand.
The last time I did this, one of the comments I inserted was "This class is never used".
Do try to understand the code by fixing bugs in it. Do correct or maintain documentation. Don't modify comments in the code itself, that risks introducing new bugs.
In our line of work, generally speaking we do no changes to production code without good reason. This includes cosmetic changes; even these can introduce bugs.
No matter how disgusting a section of code seems, don't be tempted to rewrite it unless you have a bugfix or other change to do. If you spot a bug (or possible bug) when reading the code trying to learn it, record the bug for later triage, but don't attempt to fix it.
Another Procedure...
After reading Andy Hunt's "Pragmatic Thinking and Learning - Refactor Your Wetware" (which doesn't address this directly), I picked up a few tips that may be worth mentioning:
Observe Behavior:
If there's a UI, all the better. Use the app and get a mental map of relationships (e.g. links, modals, etc). Look at HTTP request if it helps, but don't put too much emphasis on it -- you just want a light, friendly acquaintance with app.
Acknowledge the Folder Structure:
Once again, this is light. Just see what belongs where, and hope that the structure is semantic enough -- you can always get some top-level information from here.
Analyze Call-Stacks, Top-Down:
Go through and list on paper or some other medium, but try not to type it -- this gets different parts of your brain engaged (build it out of Legos if you have to) -- function-calls, Objects, and variables that are closest to top-level first. Look at constants and modules, make sure you don't dive into fine-grained features if you can help it.
MindMap It!:
Maybe the most important step. Create a very rough draft mapping of your current understanding of the code. Make sure you run through the mindmap quickly. This allows an even spread of different parts of your brain to (mostly R-Mode) to have a say in the map.
Create clouds, boxes, etc. Wherever you initially think they should go on the paper. Feel free to denote boxes with syntactic symbols (e.g. 'F'-Function, 'f'-closure, 'C'-Constant, 'V'-Global Var, 'v'-low-level var, etc). Use arrows: Incoming array for arguments, Outgoing for returns, or what comes more naturally to you.
Start drawing connections to denote relationships. Its ok if it looks messy - this is a first draft.
Make a quick rough revision. Its its too hard to read, do another quick organization of it, but don't do more than one revision.
Open the Debugger:
Validate or invalidate any notions you had after the mapping. Track variables, arguments, returns, etc.
Track HTTP requests etc to get an idea of where the data is coming from. Look at the headers themselves but don't dive into the details of the request body.
MindMap Again!:
Now you should have a decent idea of most of the top-level functionality.
Create a new MindMap that has anything you missed in the first one. You can take more time with this one and even add some relatively small details -- but don't be afraid of what previous notions they may conflict with.
Compare this map with your last one and eliminate any question you had before, jot down new questions, and jot down conflicting perspectives.
Revise this map if its too hazy. Revise as much as you want, but keep revisions to a minimum.
Pretend Its Not Code:
If you can put it into mechanical terms, do so. The most important part of this is to come up with a metaphor for the app's behavior and/or smaller parts of the code. Think of ridiculous things, seriously. If it was an animal, a monster, a star, a robot. What kind would it be. If it was in Star Trek, what would they use it for. Think of many things to weigh it against.
Synthesis over Analysis:
Now you want to see not 'what' but 'how'. Any low-level parts that through you for a loop could be taken out and put into a sterile environment (you control its inputs). What sort of outputs are you getting. Is the system more complex than you originally thought? Simpler? Does it need improvements?
Contribute Something, Dude!:
Write a test, fix a bug, comment it, abstract it. You should have enough ability to start making minor contributions and FAILING IS OK :)! Note on any changes you made in commits, chat, email. If you did something dastardly, you guys can catch it before it goes to production -- if something is wrong, its a great way to get a teammate to clear things up for you. Usually listening to a teammate talk will clear a lot up that made your MindMaps clash.
In a nutshell, the most important thing to do is use a top-down fashion of getting as many different parts of your brain engaged as possible. It may even help to close your laptop and face your seat out the window if possible. Studies have shown that enforcing a deadline creates a "Pressure Hangover" for ~2.5 days after the deadline, which is why deadlines are often best to have on a Friday. So, BE RELAXED, THERE'S NO TIMECRUNCH, AND NOW PROVIDE YOURSELF WITH AN ENVIRONMENT THAT'S SAFE TO FAIL IN. Most of this can be fairly rushed through until you get down to details. Make sure that you don't bypass understanding of high-level topics.
Hope this helps you as well :)
All really good answers here. Just wanted to add few more things:
One can pair architectural understanding with flash cards and re-visiting those can solidify understanding. I find questions such as "Which part of code does X functionality ?", where X could be a useful functionality in your code base.
I also like to open a buffer in emacs and start re-writing some parts of the code base that I want to familiarize myself with and add my own comments etc.
One thing vi and emacs users can do is use tags. Tags are contained in a file ( usually called TAGS ). You generate one or more tags files by a command ( etags for emacs vtags for vi ). Then we you edit source code and you see a confusing function or variable you load the tags file and it will take you to where the function is declared ( not perfect by good enough ). I've actually written some macros that let you navigate source using Alt-cursor,
sort of like popd and pushd in many flavors of UNIX.
BubbaT
The first thing I do before going down into code is to use the application (as several different users, if necessary) to understand all the functionalities and see how they connect (how information flows inside the application).
After that I examine the framework in which the application was built, so that I can make a direct relationship between all the interfaces I have just seen with some View or UI code.
Then I look at the database and any database commands handling layer (if applicable), to understand how that information (which users manipulate) is stored and how it goes to and comes from the application
Finally, after learning where data comes from and how it is displayed I look at the business logic layer to see how data gets transformed.
I believe every application architecture can de divided like this and knowning the overall function (a who is who in your application) might be beneficial before really debugging it or adding new stuff - that is, if you have enough time to do so.
And yes, it also helps a lot to talk with someone who developed the current version of the software. However, if he/she is going to leave the company soon, keep a note on his/her wish list (what they wanted to do for the project but were unable to because of budget contraints).
create documentation for each thing you figured out from the codebase.
find out how it works by exprimentation - changing a few lines here and there and see what happens.
use geany as it speeds up the searching of commonly used variables and functions in the program and adds it to autocomplete.
find out if you can contact the orignal developers of the code base, through facebook or through googling for them.
find out the original purpose of the code and see if the code still fits that purpose or should be rewritten from scratch, in fulfillment of the intended purpose.
find out what frameworks did the code use, what editors did they use to produce the code.
the easiest way to deduce how a code works is by actually replicating how a certain part would have been done by you and rechecking the code if there is such a part.
it's reverse engineering - figuring out something by just trying to reengineer the solution.
most computer programmers have experience in coding, and there are certain patterns that you could look up if that's present in the code.
there are two types of code, object oriented and structurally oriented.
if you know how to do both, you're good to go, but if you aren't familiar with one or the other, you'd have to relearn how to program in that fashion to understand why it was coded that way.
in objected oriented code, you can easily create diagrams documenting the behaviors and methods of each object class.
if it's structurally oriented, meaning by function, create a functions list documenting what each function does and where it appears in the code..
i haven't done either of the above myself, as i'm a web developer it is relatively easy to figure out starting from index.php to the rest of the other pages how something works.
goodluck.