Everything is a flow? - language-agnostic

Some of my recent web projects that I worked on, use a flow engine as the central abstraction in the presentation and/or (more or less the) business layer. Reflecting on my experiences, I can honestly say that I am not a fan of the flow-centric approach. On the contrary even. I see the same symptoms pop up in projects that use flows as central abstraction.
Everything is a flow. You don't just start an application, no, you "enter the main flow" even if it is just to show a menu with a huge dispatcher behind it. I am not against flows as such. Some use cases keep popping up everywhere and need to be included at various points in other use cases (LookupCustomer, ...), but for flow-centric people everything is a flow, even things that are... not flows.
Fragmentation. Flow-based applications tend to have many pieces of logic (actions, commands, fragments of code to prepare the view...) dispersed throughout the code. Mapping in and out of these actions adds overhead, is tedious and bloats the code. Although it is easy to follow the abstract flow, actually figuring out what is happening inside these little (or big) chunks of code is another thing. While every style of application allows people to write bad and inconsistent code, flow-centric applications make it particularly easy to do so.
Config files. Most applications use some XML format to declare flows and actions that accompany state changes. The language in which the application is written (say Java, C#, Ruby, ...) is probably far more richer and expressive than the XML format ever will be. Why bother?
Flows break encapsulation. If you give me a component that has a certain embedded flow logic, then the flow should be part of the component, and should not be an external abstraction. In other words: the flow is part of the component and the component is self-contained. It is a detail. Sure, it can be parameterized and stuff, but a component should "just work". People writing a Swing, GWT, or whatever fat or rich interface application, don't bother with explicit flow abstractions. Why should my web application have one? Give me the flow diagram of GMail.
(Edit) Flows are procedural. If you look at "rich" patterns like MVC with events and everything, flows really pale in comparison. You are using an modern and expressive language to implement your application, right? So you can do better than the rigid "do this, then that, and that, and ..." way from the time when punchcards and assembler were in fashion.
Examples of frameworks that promote flow-centric development are Struts, BTT, Spring Webflow, and JSF. I've also come across homegrown flow engines built on top of ordinary servlets.
This is also an interesting article: http://chillenious.wordpress.com/2006/07/16/on-page-navigation/
Do you (still) think a flow-centric approach for (the front-end of) a web-application is a good idea?

In general, flows seem to be an unnecessarily enterprisey approach to what should be a relatively simple problem: we would like to ensure that users take one of several particular paths through our application. What's more instructive and insightful is to examine why we need this path to occur. Is it because...
... we don't want them to interact with our application except in rigidly predefined ways? Then we've limited the utility of our application, and we make our application much harder to change and use.
... we're worried about the ability of our application to handle unexpected input or deal with states we haven't anticipated if people stray off the beaten path? Then that says a lot about our technical choices for a validation framework.
... we can't envision a scenario other than the predefined ones under which someone would use the site? Then we are implicitly assuming that only we know how best to use it; we limit the ability of the user to control their interaction.
Notice how each of these underscores an issue intrinsic to the application's development and team members, and one that's not the fault of a user. So I support your general premise that flow-based approaches tend to have a number of issues.
The primary problem is that flows unnecessarily increase brittleness that is already better abstracted by other mechanisms. For example, to achieve a rule like "you need to fill out your order form before you confirm checkout", don't make a workflow; have a better CustomerOrder model that knows when it doesn't have all the information necessary to allow an OrderConfirmation. If you try to skip ahead, your model and controller should take care of failing validation on the next POST.
Essentially, flows extract out disparate fragments of each participating controller and collect them into a new "flow controller" that's specific to each flow. That's not necessarily a bad idea, but it suggests that the original controllers may have been taking on too much responsibility to begin with if that sort of path was so easy to define separately. For example, if you previously had OrderConfirmation, CustomerOrder, and OrderCheckout controllers, and you're thinking about an Order flow to link all three together, what you should probably be thinking about is an Order controller instead.

I think defining flows is useful in a web application. In answer to your main points:
Everything is a Flow.
There is nothing intrinsically wrong with that, it's just a name to give something. A flow can be short or long - I agree it's a bit weird that there is a "main" flow that starts everything but it doesn't really cause any problems in practice.
Fragmentation
You have some valid points here, although I get the feeling that the greatest contributor to this is the design of the DSL. For example, Spring WebFlow v2 is a vast improvement over SWF v1 in terms of readability and understandability.
Config files
I strongly disagree with this point. I feel that xml is the best way to express this code. If you think about it - managing controllers, views, state changes and actions is really just "configuration" rather than "code". And xml (in my opinion) is the best way to express configuration. Just think about the word "Controller". All a controller does is direct and configure things - call services, return views and models etc. There is no need for any richness or expressiveness of Java to define what is basically just configuration of your web application.
Flows break encapsulation
GMail could expressed in a series of flows. Think about the number of steps it takes to compose and send an email. Flows really just define the wiring of how the application works - sure you could have a number of components that interact with each other, but the way that you configure them to work together is essentially the flow you have defined in your application. Making this flow explicit in a separate DSL seems like a good idea to me, as fundamentally it is separate.

The first question that should be answered is whether a flow framework is really the best tool for your specific web application. I'm a fan of Spring Web Flow, myself, but I'll only use it if my web app can easily be broken down into flows, and if navigation should be tightly controlled. If the navigation is very loose, where you can get to almost any page from any other page, then SWF isn't the right tool for the job.
As you mention, there are other drawbacks to flow frameworks. They usually aren't RESTful, and thus not bookmarkable. If that conflicts with what you want for your application, then SWF probably isn't for you.
That said, SWF, and some of the other flow frameworks, offer some features that few other web frameworks deliver. This includes complete solutions for double-submit issues and browser back button and history handling. SWF's implementation of these features lends some additional security. Since the flow execution IDs for each page change as the application is used, you get immunity to forced browsing and some protection against cross-site request forgery.
The concept of flows is quite nice, in my opinion, since flows tend to mirror use cases. Scoping data to a flow or a conversation removes the responsibility for its cleanup from the developer, which I think is a very good thing. It's like the difference between manual memory management and garbage collection. Not only does it make less work for me, but it eliminates the possibility of introducing bugs should I forget to cleanup attributes. One thing I hated about Struts was that I needed to duplicate my cleanup code in several actions to ensure correctness. It's much easier just to scope the data to the use case.
Flows also present a context for related actions and views. If I look at a struts-config or faces-config file, I can see all kinds of navigation rules or action mappings, but there is no immediate context for me to mentally group related items together. I have to manually trace through the configuration, and even then sometimes I get stuck. With Struts, I need to look at the specific web pages in order to figure out which actions can be invoked from a view.
With SWF, I can clearly see all the actions, views, and models related to a flow. With Eclipse plugins, I can see this as a state diagram. Even if you're not using eclipse, it's very easy to translate a flow definition to a state diagram. These diagrams are useful for myself, my project manager, and pretty much anyone who wants to understand the high-level of how a use case is performed. In short, chunking related things together allows for easier understanding, and a shallower learning curve. That's one reason why OOP is so popular. With web apps, the idea of chunking these elements together to form a use case just feels natural.

Everything is a flow
Everything really is a flow. Computer programs had always been a flow and will always be a flow containing of theese processes:
input -> process -> output
The MVC design pattern in fact corresponds with this..
controller -> model -> view
Fragmentation
You're right. But I think this "issue" might be reduced by a good suppport in IDEs.
Config files
There's no doubt xml is the best way to express configuration.
Flows break encapsulation
I would disagree with this. You can make black boxes using flows and then use these black boxes in another flows.

IMHO, web apps are best developed as independent modules rather than modules that are "bound by flow".
Since most web apps today are ajaxy apps, having independent modules on the page helps a lot.
Configurations can be handled by XML or JSON files.

Web 2.0 presents a serious challenge to the notion that "everything is a flow". And when the presentation tier is fully transposed to the client layer, we'll be back on the solid, and familiar (from GUIs of yore) ground of event based processing.

Flows arise because of the inherit mismatch between traditional application interaction and the way web applications actually work. Flows are merely a convenient way to describe what would be more traditionally modeled as a series of GUI dialogs (think wizard) in a way that is compatible with the way in which web pages are delivered and interacted with. Imagine if you will that you were writing a traditional program, but every time the user ran the program you could only display a single dialog box, and when the user clicked "Ok" (or "Cancel", or "Next", or "Previous") your program would terminate. In that situation, how would you go about modeling the expected behavior of the program (to further complicate matters, assume many users are running the program at different times)? I think you would find you would rather quickly arrive at something similar to flows.
I think perhaps what you're really asking is, "Why are most flow frameworks so easy to abuse?", which naturally leads to the followup question "What can be done to fix that?".

Related

Making contexts explicit in the directory structure

I am looking for feedback on a certain directory structure for an application. I realize that this does not follow the classical stack overflow format where there is such a thing as "a correct answer", though think it is interesting nonetheless. To provide meaningful feedback, some context first needs to be understood, so please bear with me.
--
Two colleagues of mine and I have created an application that uses the Clean Architecture. HTTP requests to routes get turned into request models, which gets handed to use cases, which then spit out a response model that gets handed to a presenter.
The code is fully open source and can be found on GitHub. We also have some docs describing what the main directories are about.
We are thinking about reorganizing our code and would like to get feedback on what we've come up with so far. Primarily amongst the reasons for this reorganization are:
Right now we do not have a nice place to put things that are not part of our domain, yet somehow bind to it. For instance authorization code, which knows about donation ids (with authorization not being part of the core domain, while donation ids are).
It's nice to group cohesive things together. Our Donation code is cohesive and our Membership Application code is cohesive, while both don't depend on each other. This is closely related to the notion of Bounded Contexts in Domain Driven Design. Right now these contexts are not explicitly visible in our code, so it is easy to make them dependent on each other, especially when you are not familiar with the domain.
These are the contexts we have identified so far. This is a preliminary list and just to give you an idea, and not the part I want feedback on.
Donation
Membership
Form support stuff (validation of email, generation of IBAN, etc)
The part I want feedback on is the directory structure we think of switching to:
src/
Context_1/
DataAccess/
Domain/
Model/
Repositories/
UseCases/
Validation/
Presentation/
Authorization/
Context_2/
Factories/
Infrastructure/
tests/
Context_1/
Unit/
Integration/
EdgeToEdge/
System/
TestDoubles/
Context_2/
The Authorization/ folder directly inside of the context would provide a home for our currently oddly placed authorization code in Infrastructure. Other code not part of our domain, yet binding to it, can go directly into the context folder, and gets its own folder if there is a cohesive/related bunch of stuff amongst it, such as authorization.
I'm happy to provide additional information you need to provide useful feedback.
Right now we do not have a nice place to put things that are not part of our domain, yet somehow bind to it.
Right now these contexts are not explicitly visible in our code, so it is easy to make them dependent on each other, especially when you are not familiar with the domain.
There are both technical and non-technical ways to address this issue:
You can enforce stricter separation through class libraries. It is more obvious you are taking a dependency on something if you have to import a dll / reference another project. It will also prevent circular dependencies.
Code reviews / discipline is a non-technical way to handle it.
I've been using Hexagonal Architecture with DDD where the domain is in the middle. Other concerns such as repositories are represented by interfaces. Your adapters then take a reference to the domain, but never in the other direction. So you might have an IRepository in your domain, but your WhateverDatabaseRepository sits in it's own project. It is then the responsibility of the application services / command handlers to co-ordinate your use cases and load the adapters. This is also where you would apply cross-cutting concerns such as authorization.
I'd recommend watching Greg Young videos (try this one) and reading Vaughn Vernon's IDDD as it goes into how to structure applications and deals with questions like yours. (sorry that my answer is basically watch a 6hr video and read a 600+ page book, but they both really helped clarify some of the more "wooly" aspects of DDD for me)
As an example, see https://github.com/gregoryyoung/m-r/blob/master/SimpleCQRS/CommandHandlers.cs

Presentation patterns to use with Ext

Which presentation patterns do you think Ext favors or have you successfully used to achieve high testability and also maintainability?
Since Ext component instances usually come tightly coupled with state and some sort of presentation logic (e.g. format validation for text fields), Passive View is not a natural fit. Supervising Presenter seems like it can work (and I've painlessly used it in one occasion). How about the suitability of Presentation Model? Any others?
While this question is specifically for Ext, it can apply to similar frameworks like SmartClient and even RIA technologies like Flex. So, if you have any first-hand pattern experiences with any other web UI technologies, your input would still be appreciated.
When thinking of presentation patterns, this is a great quote:
Separating user interface code from
everything else is a key principle in
well-engineered software. But it’s not
always easy to follow and it leads to
more abstraction in an application
that is hard to understand. Quite a
lot design patterns try to target this
scenario: MVC, MVP, Supervising
Controller, Passive View,
PresentationModel,
Model-View-ViewModel, etc. The reason
for this variety of patterns is that
this problem domain is too big to be
solved by one generic solution.
However, each UI Framework has its own
unique characteristics and so they
work better with some patterns than
with others.
As far as Ext is concerned, in my opinion the closest pattern would be the Model-View-Viewmodel, however this pattern is inherently difficult to code for whilst maintaining the separation of the key tenets (state, view, model).
That said, as per the quote above, each pattern tries to solve/compartmentalise/simplify a problem/situation often too complex for the individual application at hand, or which often fails when you try and take it to its absolute. As such, think about getting a 'best fit' as opposed to an absolute when pattern matching application development.
And remember:
The reason
for this variety of patterns is that
this problem domain is too big to be
solved by one generic solution.
I hope this helps!
2 yeas have passed since this question was aksed and now Ext-JS 4 has a built-in implementation of the MVC pattern. However, instead of an MVP (which I prefer), it favors a straight controller because the views attachment themselves to the models through stores.
Here's the docs on the controller:
http://docs.sencha.com/ext-js/4-1/#!/api/Ext.app.Controller
Nonetheless it can be made to act more like a supervising controller. One nice aspect of Ext-JS is the global application objects ability to act like an event bus for handling controller to controller communication. See this post on how to do that:
http://www.sencha.com/forum/showthread.php?176495-How-to-listen-for-custom-events-fired-in-application
Of course the definitive explanation of all these patterns can be found here:
http://martinfowler.com/eaaDev/uiArchs.html

Reusability, testability, code complexity reduction and showing-off-ability programming importance

There are lots of programming and architecture patterns. Patterns allow to make code cleaner, reusable, maintainable, more testable & at last (but not at least) to feel the follower a real cool developer.
How do you rank these considerations? What does appeal you most when you decide to apply pattern?
I wonder how many times code reusability (especially for MVP, MVC patterns) was important? For example DAL library often shared between projects (it's reusable) but how often controllers/views (abstracted via interfaces) are reused?
I think you missed the single most important one from your list - more maintainable. Code that is well and consistently structured (as you get with easily reusable code) is much more easily maintained.
And as for reusablilty, then yes, on a number of occasions, usually something like : create a web page to save/update some record. Some months later - we need to expose this as a service for a third party to consume - if your code is structured well, this should be easy and low risk, as you're only adding a new front end.
I hope most people use patterns to learn how to solve design problems in certain context. All those non-functional requrements you mention can be really important depending on stakeholder needs for a project.
As for MVC etc. it is not meant only to be reused between projects, that is often not possible or a good idea. The benefits you get from MVC should be important in the project you use that architecture. You can change independently details in view and models, you can reuse views with controllers for different models, you should be able to change persistence details without affecting your controllers and views. All this is imho very important during development of a single project.
"Code reusability" as defined in many books is more or less a myth. Try to focus more on easy to read - easy to maintain. Don't start with "reusability" in mind, will be better if you will start to think first on testability and then to reuse something. Is important to deliver, to test, to have clean code, to refactor, to not repeat yourself and less important to build from the start components that can be reused between projects. Whatever is to be reused must be a natural process, more like a discovery: you see a repetition so you build something that can be reused in that specific situation.
Code complexity reduction ranks high, if I keep things simple, I can maintain the project better and work on it faster to add/change features.
Reusability is a tool, one that has its uses, but not in every place. I usually refactor for reusability those components that show a clear history of identical use in more than three places. Otherwise, I risk running into the need of specialized behavior in a place or two, and end up splitting a component in a couple of more specialized ones that share a similar structure, but would be hard to understand if kept together.
Testability is not something I personally put a lot of energy in. However it derives in many cases from the reduced code complexity: if there are not a lot of dependencies and intricate code paths, there will be less dangers to break tests or make them more difficult to perform.
As for showing-off-ability... well... the customer is interested in how well the app performs in terms of what he wants from it, not in terms of how "cool" my code is. 'nuff said

At what point should architecture become layered?

Obviously, "Hello World" doesn't require a separated, modular front-end and back-end. But any sort of Enterprise-grade project does.
Assuming some sort of spectrum between these points, at which stage should an application be (conceptually, or at a design level) multi-layered? When a database, or some external resource is introduced? When you find that the you're anticipating spaghetti code in your methods/functions?
when a database, or some external resource is introduced.
but also:
always (except for the most trivial of apps) separate AT LEAST presentation tier and application tier
see:
http://en.wikipedia.org/wiki/Multitier_architecture
Layers are a mean to keep a design loosely coupled and highly cohesive.
When you start to have a few classes (either implemented or just sketched with UML), they can be grouped logically, into layers - or more generally packages, or modules. This is called the art of separating the concerns.
The sooner the better: if you do not start layering early enough, then you risk to have never do it as the effort can be too important.
Here are some criteria of when to...
Any time you anticipate the need to
replace one part of it with a
different part.
Any time you find
yourself need to divide work amongst
parallel team.
There is no real answer to this question. It depends largely on your application's needs, and numerous other factors. I'd suggest reading some books on design patterns and enterprise application architecture. These two are invaluable:
Design Patterns: Elements of Reusable Object-Oriented Software
Patterns of Enterprise Application Architecture
Some other books that I highly recommend are:
The Pragmatic Programmer: From Journeyman to Master
Refactoring: Improving the Design of Existing Code
No matter your skill level, reading these will really open your eyes to a world of possibilities.
I'd say in most cases dealing with multiple distinct levels of abstraction in the concepts your code deals with would be a strong signal to mirror this with levels of abstraction in your implementation.
This does not override the scenarios that others have highlighted already though.
I think once you ask yourself "hmm should I layer this" the answer is yes.
I've worked on too many projects that probably started off as proof of concept/prototype that ended up being full projects used in production, which are horribly written and just wreak of "get it done quick, we'll fix it later." Trust me, you wont fix it later.
The Pragmatic Programmer lists this as the Broken Window Theory.
Try and always do it right from the start. Separate your concerns. Build it with modularity in mind.
And of course try and think of the poor maintenance programmer who might take over when you're done!
Thinking of it in terms of layers is a little limiting. It's what you see in whitepapers about a product, but it's not how products really work. They have "boxes" that depend on each other in various ways, and you can make it look like they fit into layers but you can do this in several different configurations, depending on what information you're leaving out of the diagram.
And in a really well-designed application, the boxes get very small. They are down to the level of individual interfaces and classes.
This is important because whenever you change a line of code, you need to have some understanding of the impact your change will have, which means you have to understand exactly what the code currently does, what its responsibilities are, which means it has to be a small chunk that has a single responsibility, implementing an interface that doesn't cause clients to be dependent on things they don't need (the S and the I of SOLID).
You may find that your application can look like it has two or three simple layers, if you narrow your eyes, but it may not. That isn't really a problem. Of course, a disastrously badly designed application can look like it has layers tiers if you squint as hard as you can. So those "high level" diagrams of an "architecture" can hide a multitude of sins.
My generic rule of thumb is to at least to separate the problem into a model and view layer, and throw in a controller if there is a possibility of more than one ways of handling the model or piping data to the view.
(Or as the first answer, at least the presentation tier and the application tier).
Loose coupling is all about minimising dependencies, so I would say 'layer' when a dependency is introduced. i.e. a database, third party application, etc.
Although 'layer' is probably the wrong term these days. Most of the time I use Dependency Injection (DI) through an Inversion of Control container such as Castle Windsor. This means that I can code on one part of my system without worrying about the rest. It has the side effect of ensuring loose coupling.
I would recommend DI as a general programming principle all of the time so that you have the choice on how to 'layer' your application later.
Give it a look.
R

The best way to familiarize yourself with an inherited codebase

Stacker Nobody asked about the most shocking thing new programmers find as they enter the field.
Very high on the list, is the impact of inheriting a codebase with which one must rapidly become acquainted. It can be quite a shock to suddenly find yourself charged with maintaining N lines of code that has been clobbered together for who knows how long, and to have a short time in which to start contributing to it.
How do you efficiently absorb all this new data? What eases this transition? Is the only real solution to have already contributed to enough open-source projects that the shock wears off?
This also applies to veteran programmers. What techniques do you use to ease the transition into a new codebase?
I added the Community-Building tag to this because I'd also like to hear some war-stories about these transitions. Feel free to share how you handled a particularly stressful learning curve.
Pencil & Notebook ( don't get distracted trying to create a unrequested solution)
Make notes as you go and take an hour every monday to read thru and arrange the notes from previous weeks
with large codebases first impressions can be deceiving and issues tend to rearrange themselves rapidly while you are familiarizing yourself.
Remember the issues from your last work environment aren't necessarily valid or germane in your new environment. Beware of preconceived notions.
The notes/observations you make will help you learn quickly what questions to ask and of whom.
Hopefully you've been gathering the names of all the official (and unofficial) stakeholders.
One of the best ways to familiarize yourself with inherited code is to get your hands dirty. Start with fixing a few simple bugs and work your way into more complex ones. That will warm you up to the code better than trying to systematically review the code.
If there's a requirements or functional specification document (which is hopefully up-to-date), you must read it.
If there's a high-level or detailed design document (which is hopefully up-to-date), you probably should read it.
Another good way is to arrange a "transfer of information" session with the people who are familiar with the code, where they provide a presentation of the high level design and also do a walk-through of important/tricky parts of the code.
Write unit tests. You'll find the warts quicker, and you'll be more confident when the time comes to change the code.
Try to understand the business logic behind the code. Once you know why the code was written in the first place and what it is supposed to do, you can start reading through it, or as someone said, prolly fixing a few bugs here and there
My steps would be:
1.) Setup a source insight( or any good source code browser you use) workspace/project with all the source, header files, in the code base. Browsly at a higher level from the top most function(main) to lowermost function. During this code browsing, keep making notes on a paper/or a word document tracing the flow of the function calls. Do not get into function implementation nitti-gritties in this step, keep that for a later iterations. In this step keep track of what arguments are passed on to functions, return values, how the arguments that are passed to functions are initialized how the value of those arguments set modified, how the return values are used ?
2.) After one iteration of step 1.) after which you have some level of code and data structures used in the code base, setup a MSVC (or any other relevant compiler project according to the programming language of the code base), compile the code, execute with a valid test case, and single step through the code again from main till the last level of function. In between the function calls keep moting the values of variables passed, returned, various code paths taken, various code paths avoided, etc.
3.) Keep repeating 1.) and 2.) in iteratively till you are comfortable up to a point that you can change some code/add some code/find a bug in exisitng code/fix the bug!
-AD
I don't know about this being "the best way", but something I did at a recent job was to write a code spider/parser (in Ruby) that went through and built a call tree (and a reverse call tree) which I could later query. This was slightly non-trivial because we had PHP which called Perl which called SQL functions/procedures. Any other code-crawling tools would help in a similar fashion (i.e. javadoc, rdoc, perldoc, Doxygen etc.).
Reading any unit tests or specs can be quite enlightening.
Documenting things helps (either for yourself, or for other teammates, current and future). Read any existing documentation.
Of course, don't underestimate the power of simply asking a fellow teammate (or your boss!) questions. Early on, I asked as often as necessary "do we have a function/script/foo that does X?"
Go over the core libraries and read the function declarations. If it's C/C++, this means only the headers. Document whatever you don't understand.
The last time I did this, one of the comments I inserted was "This class is never used".
Do try to understand the code by fixing bugs in it. Do correct or maintain documentation. Don't modify comments in the code itself, that risks introducing new bugs.
In our line of work, generally speaking we do no changes to production code without good reason. This includes cosmetic changes; even these can introduce bugs.
No matter how disgusting a section of code seems, don't be tempted to rewrite it unless you have a bugfix or other change to do. If you spot a bug (or possible bug) when reading the code trying to learn it, record the bug for later triage, but don't attempt to fix it.
Another Procedure...
After reading Andy Hunt's "Pragmatic Thinking and Learning - Refactor Your Wetware" (which doesn't address this directly), I picked up a few tips that may be worth mentioning:
Observe Behavior:
If there's a UI, all the better. Use the app and get a mental map of relationships (e.g. links, modals, etc). Look at HTTP request if it helps, but don't put too much emphasis on it -- you just want a light, friendly acquaintance with app.
Acknowledge the Folder Structure:
Once again, this is light. Just see what belongs where, and hope that the structure is semantic enough -- you can always get some top-level information from here.
Analyze Call-Stacks, Top-Down:
Go through and list on paper or some other medium, but try not to type it -- this gets different parts of your brain engaged (build it out of Legos if you have to) -- function-calls, Objects, and variables that are closest to top-level first. Look at constants and modules, make sure you don't dive into fine-grained features if you can help it.
MindMap It!:
Maybe the most important step. Create a very rough draft mapping of your current understanding of the code. Make sure you run through the mindmap quickly. This allows an even spread of different parts of your brain to (mostly R-Mode) to have a say in the map.
Create clouds, boxes, etc. Wherever you initially think they should go on the paper. Feel free to denote boxes with syntactic symbols (e.g. 'F'-Function, 'f'-closure, 'C'-Constant, 'V'-Global Var, 'v'-low-level var, etc). Use arrows: Incoming array for arguments, Outgoing for returns, or what comes more naturally to you.
Start drawing connections to denote relationships. Its ok if it looks messy - this is a first draft.
Make a quick rough revision. Its its too hard to read, do another quick organization of it, but don't do more than one revision.
Open the Debugger:
Validate or invalidate any notions you had after the mapping. Track variables, arguments, returns, etc.
Track HTTP requests etc to get an idea of where the data is coming from. Look at the headers themselves but don't dive into the details of the request body.
MindMap Again!:
Now you should have a decent idea of most of the top-level functionality.
Create a new MindMap that has anything you missed in the first one. You can take more time with this one and even add some relatively small details -- but don't be afraid of what previous notions they may conflict with.
Compare this map with your last one and eliminate any question you had before, jot down new questions, and jot down conflicting perspectives.
Revise this map if its too hazy. Revise as much as you want, but keep revisions to a minimum.
Pretend Its Not Code:
If you can put it into mechanical terms, do so. The most important part of this is to come up with a metaphor for the app's behavior and/or smaller parts of the code. Think of ridiculous things, seriously. If it was an animal, a monster, a star, a robot. What kind would it be. If it was in Star Trek, what would they use it for. Think of many things to weigh it against.
Synthesis over Analysis:
Now you want to see not 'what' but 'how'. Any low-level parts that through you for a loop could be taken out and put into a sterile environment (you control its inputs). What sort of outputs are you getting. Is the system more complex than you originally thought? Simpler? Does it need improvements?
Contribute Something, Dude!:
Write a test, fix a bug, comment it, abstract it. You should have enough ability to start making minor contributions and FAILING IS OK :)! Note on any changes you made in commits, chat, email. If you did something dastardly, you guys can catch it before it goes to production -- if something is wrong, its a great way to get a teammate to clear things up for you. Usually listening to a teammate talk will clear a lot up that made your MindMaps clash.
In a nutshell, the most important thing to do is use a top-down fashion of getting as many different parts of your brain engaged as possible. It may even help to close your laptop and face your seat out the window if possible. Studies have shown that enforcing a deadline creates a "Pressure Hangover" for ~2.5 days after the deadline, which is why deadlines are often best to have on a Friday. So, BE RELAXED, THERE'S NO TIMECRUNCH, AND NOW PROVIDE YOURSELF WITH AN ENVIRONMENT THAT'S SAFE TO FAIL IN. Most of this can be fairly rushed through until you get down to details. Make sure that you don't bypass understanding of high-level topics.
Hope this helps you as well :)
All really good answers here. Just wanted to add few more things:
One can pair architectural understanding with flash cards and re-visiting those can solidify understanding. I find questions such as "Which part of code does X functionality ?", where X could be a useful functionality in your code base.
I also like to open a buffer in emacs and start re-writing some parts of the code base that I want to familiarize myself with and add my own comments etc.
One thing vi and emacs users can do is use tags. Tags are contained in a file ( usually called TAGS ). You generate one or more tags files by a command ( etags for emacs vtags for vi ). Then we you edit source code and you see a confusing function or variable you load the tags file and it will take you to where the function is declared ( not perfect by good enough ). I've actually written some macros that let you navigate source using Alt-cursor,
sort of like popd and pushd in many flavors of UNIX.
BubbaT
The first thing I do before going down into code is to use the application (as several different users, if necessary) to understand all the functionalities and see how they connect (how information flows inside the application).
After that I examine the framework in which the application was built, so that I can make a direct relationship between all the interfaces I have just seen with some View or UI code.
Then I look at the database and any database commands handling layer (if applicable), to understand how that information (which users manipulate) is stored and how it goes to and comes from the application
Finally, after learning where data comes from and how it is displayed I look at the business logic layer to see how data gets transformed.
I believe every application architecture can de divided like this and knowning the overall function (a who is who in your application) might be beneficial before really debugging it or adding new stuff - that is, if you have enough time to do so.
And yes, it also helps a lot to talk with someone who developed the current version of the software. However, if he/she is going to leave the company soon, keep a note on his/her wish list (what they wanted to do for the project but were unable to because of budget contraints).
create documentation for each thing you figured out from the codebase.
find out how it works by exprimentation - changing a few lines here and there and see what happens.
use geany as it speeds up the searching of commonly used variables and functions in the program and adds it to autocomplete.
find out if you can contact the orignal developers of the code base, through facebook or through googling for them.
find out the original purpose of the code and see if the code still fits that purpose or should be rewritten from scratch, in fulfillment of the intended purpose.
find out what frameworks did the code use, what editors did they use to produce the code.
the easiest way to deduce how a code works is by actually replicating how a certain part would have been done by you and rechecking the code if there is such a part.
it's reverse engineering - figuring out something by just trying to reengineer the solution.
most computer programmers have experience in coding, and there are certain patterns that you could look up if that's present in the code.
there are two types of code, object oriented and structurally oriented.
if you know how to do both, you're good to go, but if you aren't familiar with one or the other, you'd have to relearn how to program in that fashion to understand why it was coded that way.
in objected oriented code, you can easily create diagrams documenting the behaviors and methods of each object class.
if it's structurally oriented, meaning by function, create a functions list documenting what each function does and where it appears in the code..
i haven't done either of the above myself, as i'm a web developer it is relatively easy to figure out starting from index.php to the rest of the other pages how something works.
goodluck.