Improving the way we write code? - language-agnostic

While thinking about software-engineering in general I came across the question why we don't see any improvements in the way we write/document code.
Think about it: There has not been a revolutionary improvement since we've moved from punch cards to text editing. The last improvement I've seen is syntax highlighting and context sensitive help (e.g. Intellisense or ctags). Not something I would call revolutionary.
That makes me wonder: Why is it so?
I'll start with something I miss badly:
Lots of my code deals with geometry.
For documentation describing geometric relationships always ends up in a big heap of hard to read mathematical stuff (due to the lack of proper equation type-setting in ASCII). However, if I could embed a little drawing or scribble into the code everything would be much easier, neater and better to be understood.
What can you think up that would make your coding/text editing/documention tasks easier?

I'm surprised that nobody has yet mentioned No Silver Bullet. In 1986 (!), Frederick Brooks predicted that:
There is no single development, in either technology or management technique, which by itself promises even one order-of-magnitude [tenfold] improvement within a decade in productivity, in reliability, in simplicity. [...] We cannot expect ever to see two-fold gains every two years."
And in 23 years, he's been proven right. We've come up with a number of things such as syntax highlighting and Intellisense which have improved productivity significantly, but certainly not by an order of magnitude. As time marches on, we'll continue to make several incremental improvements, but the fact is there is no silver bullet: there's not going to be some magical revelation in the way we write code that will improve productivity by an order of magnitude.

I'm suprised that no one seems to have mentioned Donald Knuth's seminal Literate Programming - write your code as if it were a book or a scientific paper.

There has not been a revolutionary improvement since we've moved from punch cards to text editing
Never used a line editor, have you?
But seriously, text (especially in the representations chosen for modern languages) is
easily processed
fairly easy to specify
information dense
precise
Anything that comes along to replace it has to be a net win across all four of those properties. Not easy.

I disagree. We do have changes, small, but changes.
How common is the "for each" construct? Compare it to 20 years ago. How about the Domain Specific Languages movement? What about the idea that we should code in layers? How about Behavior Driven Development? Coding by complying to a specification... which writes a nice document as output when all runs fine. How about the standarization of regular expressions? PCRE. What about Alan Kay's group's DSL related work on "Moore's Law for Software", which explored a more advanced implementation of Cairo and generated TCP/IP code using diagrams from RFCs?
Documentation is a two way dialog. Both as code being more understandable and people learning this special language. You wouldn't say that German needs documentation if you know German. I know natural languages are very far away from computer languages, but there's a movement to make code more expressive. It's not about the new tools, it's about how we are coding.

One thing I've done recently in some of the more math-heavy sections of my application is to include the LaTeX markup for the particular equation as a comment/docstring. Right now, I just copy-paste into an online equation editor, but it would be very helpful to see the formula itself (with things like Greek letters and sub/superscripts) rather than a bunch of ASCII code.

Source Code In Database. In a nutshell, source code is parsed and put into a database. You'd then need an integrated IDE to view and edit the code, but at this point, syntax is decoupled from format. YOUR IDE could show you a program in a way that's completely different from someone else's, tuned to the task you're working on. I'd list some specific examples, but that article covers pretty much everything.

I'm surprised nobody mentioned it - javadoc is basically HTML, so there's nothing preventing you from embedding images (or anything else) in code. Simple, effective and ubiquitous, it's one of the things Java did right.

DrScheme let's you do these things. Here's the things you can insert from the PLT website:
http://docs.plt-scheme.org/drscheme/Menus.html#(part._.Insert)
3.1.6 Insert
Insert Comment Box : Inserts a box that is ignored by DrScheme; use it to write comments for people who read your program.
Insert Image... : Opens a find-file dialog for selecting an image file in GIF, BMP, XBM, XPM, PNG, or JPG format. The image is treated as a value.
Insert Fraction... : Opens a dialog for a mixed-notation fraction, and inserts the given fraction into the current editor.
Insert Large Letters... : Opens a dialog for a line of text, and inserts a large version of the text (using semicolons and spaces).
Insert λ : Inserts the symbol λ (as a Unicode character) into the program. The λ symbol is normally bound the same as lambda.
Insert Java Comment Box : Inserts a box that is ignored by DrScheme. Unlike the Insert Comment Box menu item, this is designed for the ProfessorJ language levels. See ProfessorJ.
Insert Java Interactions Box : Inserts a box that will allows Java expressions and statements within Scheme programs. The result of the box is a Scheme value corresponding to the result(s) of the Java expressions. At this time, Scheme values cannot enter the box. The box will accept one Java statement or expression per line.
Insert XML Box : Inserts an XML; see XML Boxes and Scheme Boxes for more information.
Insert Scheme Box : Inserts a box to contain Scheme code, typically used inside an XML box; see XML Boxes and Scheme Boxes.
Insert Scheme Splice Box : Inserts a box to contain Scheme code, typically used inside an XML box; see also XML Boxes and Scheme Boxes.
Insert Pict Box : Creates a box for generating a Slideshow picture. Inside the pict box, insert and arrange Scheme boxes that produce picture values.
You also insert your unit tests with the code that you're testing. Pretty neat stuff.

I think integrated IDEs with semantic highlighting and **semantically-constrained suggestions* (a la IDEA or Eclipse) are a huge advancement.
But that happened 8-10 years ago.
Template-based programming feels useful never seems to catch on. Recently I was impressed with a demo of the Meta-programming system, which leverages the interactive nature of the IDE to simplify the task of writing templates and what are (essentially) type-aware macros.
Meta-programming might help you define geometry-based macros that would substitute for a number of lines of code. I could imagine something that let you embed a more-readable 'math language' inside Java, and then parses its contents into something machine-readable.

I'd say version control was a pretty huge leap in how we work. The ability to keep a full record of every change anyone has made to the codebase, and to revert changes where necessary, has made a big difference.

I certainly respect Fred Brooks' argument No Silver Bullet, but I think the way we write code is nowhere near optimal, so there is lots more room for improvement. I tried to explain this in my book.
We're all familiar with "code golf", where you compete relentlessly to minimize something. That is a good way to approach the minimum possible value of that something.
What's great about this is that you are allowed, even encouraged, to break from traditions, prior conceptions, accepted wisdom, in the quest for winning. In short, you learn new things.
If the measure to be minimized is wall-clock execution time, you can do aggressive optimization.
If the measure is source code size (lines or characters) you get "code golf".
The measure I like best is "edit count". That is, given a code base, suppose a new requirement comes along. That requirement is implemented, completely, by editing the code base. Then a "diff" is done from old to new code base. The number of differences found is the edit count. Averaged over the set of likely new functional requirements, that is the measure to minimize.
If this is done aggressively, being free to contradict all conventional wisdom, the code base approaches a state I would call a domain-specific language (DSL). In this language, concepts expressed in code are in nearly 1-1 correspondence with problem-oriented concepts. In this state, it is not easy for the source code to be self-inconsistent (i.e. have bugs) because the fewer edits that have to be made to the source code, the fewer chances there are to make a mistake. It's also the case that such code tends to be short. But unlike "code golf" it tends to be very clear, because it maps the problem concepts so clearly.
So, tools and techniques that help in minimizing edit count can, in my opinion, be considered "silver bullets". DSL is one such. Code generation is another. My favorite optimization technique is another. For coding dynamically changing UIs there is differential execution. There are bound to be more, waiting to be discovered. Of course, everything depends on the training and experience of the "marksman" (the coder).
I think there are lots of new ideas to be discovered. The trick is to tell the difference between the ones that move us forward, versus the ones that hold us back.

I think this is where Doxygen and other documentation systems help. If we can embed small, discrete comments that link to other information such as:
/* help: fooimg.png */
And then have an external documentation system do that, then great.
Even better would be allowing our text-editor to treat those things as hyperlinks to external documentation.

I would reference a drawing as a reference in the code documentation. I see no reason why you can't have footnotes in code.

The ability to make a section of code read-only is something I've wanted

It sounds like you might be interested in Jonathan Edward's research. See, for example:
"The Summer of Code"
"What's next?"
"The future of programming"

Diffing and searching pictures is hard. Diff and search are very important to programmers. Using pictures instead of text is only a marginal improvement in many situations, it has some drawbacks, and it requires general acceptance before it's really worth doing (since you don't make things more understandable if your reader doesn't grok what you've done).
Plus, programmers have a million little tricks that make their lives easier, based on text representations of code, that they'd lose if you gave them code to read that was expressed in anything other than text. Sure, they might replace or re-implement those tricks over time, but in the short term they're gone.
You don't see lawyers switching from English to little back-of-a-napkin diagrams in contracts, either (the Creative Commons licenses try, but cannot make the picture be the formal representation of the contract). Probably for similar reasons.
If someone comes up with a programming language and IDE that, on balance, beats text-based ones; and successfully markets it; then you'll see the start of a revolutionary shift from text to a new format. If nobody comes up with any such thing, then we're not missing out. If someone comes up with something that is more productive but it doesn't gain traction because of independent advantages of other technologies, then that loss is the price we pay for free-market capitalism. Perhaps the ideas will be recycled eventually...
That said, integration between code and documentation could clearly be improved, and there are many efforts underway to do so, using various techniques with varying success. Again, the problem is that any particular cunning plan can in practice only really be implemented in one or a few languages and development environments at a time, and so has difficulty proving that it really is better. Embedding documentation in code is possibly the only universal advance since the invention of the API...
I think there's still a lot that can be done with text, though. For example, debugger technology makes a big difference to programmer productivity in certain common circumstances (namely: when a test fails or something else unexpected happens, but it's not obvious what the faulty assumption is in the code you're looking at). There may be lower-hanging fruit in terms of making programming better, than the actual business of expressing the program.

The last improvement I've seen is
syntax highlighting and context
sensitive help
Then you haven't looked much. Modern IDEs can do far, FAR more than that, namely show you the semantic structure of code (e.g. inheritance hierarchies) and even manipulate it (automatic refactoring) or enrich it with external data (such as who last changed a particular line of code).

I've used emacs, I like text macros. But, what I really want is parse macros. I'd like my editor to expose the machinery behind refactoring in such a way that I can write my transformations on the parse tree of the language itself.
For example, Python added += at one point when my code was littered with x = x + 1 lines. If I could have written a search and replace command that worked on the parse tree, I could have quickly cleaned up large amounts of my source code.
So, I want standard search and replace, but I want it at the level where the structure of my code has meaning, at the abstract syntax tree.
If you've ever used ReSharper, each of its refactorings and recommendations are written in the manner I describe, they find a pattern in the parse tree and suggest a replacement, or for a refactoring, apply a known replacement. I want access to that machinery for my own tasks!

Have you used Doxygen or similar for documenting your code? You can add links to images, and other file types (often stored in same directory as source code) that will get sucked into the generated documentation. I realize that this is one step removed from seeing the detail directly in you favorite editor but it definitely improves how we document our code.

Programming languages are a specialized form of mathematical notation, since you can express a programming language mathematically. Notation changes slowly, and so we don't get fast progress in our languages. Mostly, we advance when we come up with a new thing to fit into the notation, like using i to refer to the square root of negative one.
There are documentation schemes that allow you to embed things other than text. There was at least one programming scheme, Donald Knuth's Web, that allowed you to have a presentation and an execution version of a program (unfortunately, the base source code, the stuff you'd actually hack, was rather messy).
You could easily have a text editor that could treat comments as HTML, provided of course it could recognize comments as it saw them.

I've been thinking a lot about how to make coding faster and more efficient for the past years, always trying to keep it realistic and doing minimalistic implementations. Those are not revolutionary ideas, but since the original poster talked about the punching card to code typing transition, I thought of talking about other ways of communicating to the computer what we want to program.
My ideas are visual or vocal programming. The motivation behind is that there are only a number of ways a loop can be efficiently programmed, and an aware IDE could make some smart code substitution decisions depending on inputs other than typed lines of code.
Visual programming vs Coding: encapsulate (literally) code into "boxes" which have inputs and outputs, and connect them together across a horizontal timeline. This is a high-level concept that would be intrinsically interesting for multithreading development since you can have multiple lines or threads happening at the same time. Every process can be divided into a "box", no matter how you see it. Sending an e-mail in its most basic form is a box which takes an email as input and outputs a success/fail signal. Since the boxes and the lines are distributed across a timeline, the notion of time and event chronology isn't lost and feedback lines are possible.
Vocal programming vs Coding: The effectiveness of this technique would revolve around the effectiveness of the vocal syntax decided to create code and move the cursor. For example, you can say to the microphone "for variable zero to 10" and the system will automatically generate the following code placing the cursor inside:
for (x=0;x<10;x++){
// Cursor would be there after after the call
}
In terms of usability, you would need to be in a relatively silent room to minimize other sounds that might harm the voice recognition so this technology could be used in specialized environments mostly.
This is the result of my extensive programming experience using a wide range of hardware and programming languages. Let me know what you guys think, I'd love having a constructive discussion about that.

A few weeks back the "Intentional Software" created quite a buzz about their new language. I've yet to watch the presentation, but here is a quote from a review by Martin Fowler:
They started worryingly, with the
usual unrevealing Powerpoints, but
then they switched to showing the
workbench and the curtain finally
opened. To gauge the reaction, take a
look at Twitter.
#pandemonial Quite impressed! This is sweet! Multiple domains, multiple
langs, no question is going unanswered
#csells OK, watching a live electrical circuit rendered and
working in a C# file is pretty damn
cool.
#jolson Two words to say about the Electronics demo for Intentional
Software: HOLY CRAPOLA. That's it, my
brain has finally exploded.
#gblock This is not about snazzy demos, this is about completely
changing the world we know it.
#twleung ok, the intellisense for the actuarial formulas is just awesome
#lobrien This is like seeing a 100-mpg carburetor : OMG someone is
going to buy this and put it in a
vault!

Two quotes come instantly to mind:
"If it ain't broke, don't fix it."
"Use the best tool for the job."
Of course, although the core code is still written as text, alll the tools and libraries have changed massively since the days of punched cards.

This has been touched on by others, and it wouldn't revolutionise programming, but anyway...
I think it would be nice if code editors moved slightly beyond plain text editors. Even with syntax highlighting and code completion (which I think are incredibly good things), the editors of today (at least, the ones I use) still display exactly the same ASCII text (or whatever encoding is used) that is in the source files. I would be interested to see how well it would work if editors displayed, for example (some examples are more adventurous than others):
Comments in a text box with a light-blue background and no // or /* ... */ visible
Javadoc comments could have semi-rich text editing support (for those who do HTML Javadoc comments) (seriously, I would appreciate it if code editors rendered Javadoc comments as HTML because they're not the easiest to skim over when their HTML as plain text)
Functions in text boxes that could be collapsed to show only the signature (the collapsing can be done by current editors) and can be dragged around as boxes
Lines between function boxes to indicate how functions are connected
Zooming out so that rather than seeing a single source file (class in many languages) you can see multiple files and the way connect to each other (this would essentially be building UML-like diagramming directly into the code editor)
I think this (in my mind at least) would work without requiring additional markup in the source files so users of plain text editors wouldn't be disadvantaged by having all this extra markup cluttering the files.

Part of the problem might stem from the fact that when you don't code we don't call it programming: Assembling modular components using a GUI for instance.

You might be interested in these alternative programming "languages".
[Ladder][1], designed to mimic the way relay-logic-schemes work. Horrible IMO, but easy to understand for the old guys who did logic with sticks and stones. [http://www.amci.com/tutorials/images/ladder-diagram.gif][2]
[SFC, Sequential function chart][3], designed to simplify parallell programming. Code is written into boxes and these boxes can be placed paralell to each other and will thus execute simultaneously. By connecting the end of several boxes you can syncronize events. Very common for automation applications.
[Mathematica][5]!!!, Might not be the best programming language but the syntax highlighting(if you can call it that) is awesome! For example you can input a matrix by seeing the matrix nicly aligned instead of a huge double[][]. Graphs can be inserted in the code and the formatting of mathematical expressions looks like it does when you write on a paper. No more paranthesis-madness or long Math.PI expressions that really only need one character. And best of all, the files are just plain text even if it is rendered nicely in the editor!
Debuggers is also an area where lots of improvement has been done. Debuggers with replay are starting to come and also visual debuggers where data can be modified in real time. Edit and continiue is also a feature i wouldn't want to live withot.
WTF "new users can only post a maximum of one hyperlink", you will have to google the stuff i originally added to this post >:(

A brain-to-computer translator. Typing is the real
bottleneck. It really just needs to derive the algorithms I
think up and convert that to machine code.
I would say a lot of the newer languages are pretty great at
quickly creating algorithms. The improvements aren't so much
revolutionary now as they are evolutionary.

Dare I say it might actually be a new development language (perhaps even a new paradigm) to take us through such revolution;

I think you might want to take a look at Leo. This is one guys attempt at answering what you're asking about. I still can't wrap my VIM head around it personally, but others take to it quickly. It's not just a programming IDE, but more of an information organizer. It's written in Python, but I don't see why you can't code in other languages with it. The power of Leo is not so much the language, but the ability to express your thoughts and organize them whether it be in code, diagrams, images, or diagrams. Look over the tutorial and examples to get a feel for it. You might like it.

Automated semantic source code transformations, where a program can be reliably examined and manipulated by using an abstract interface/frontend to it that is aware of the underlying semantics.
So that source code can be queried and dealt with pretty much like a SQL database.
Allowing you to do static analysis of source code and refactor even complex source code by doing something along the lines of:
FIND CALLERS OF FUNCTION "foo" WHERE SIGNATURE("int","int","char*") AND RETURN_TYPE("bool");
...
RENAME MACRO "max" TO "maximum" IN FILE "macros.hxx";
RENAME NAMESPACE "prj" TO "project";
RENAME SYMBOL "OLDFOO" IN NAMESPACE "project";
RENAME FUNCTION "log" TO "show_log";
RENAME CLASS "FOO" TO "OLDFOO";
RENAME METHOD "FOO::inc" TO "FOO::increment";
...
CHANGE SIGNATURE IN FUNCTION "foo" WHERE SIGNATURE("int","int") TO SIGNATURE("double","double");
CHANGE SIGNATURE IN METHOD "myClass::handle" WHERE SIGNATURE("char") TO SIGNATURE("unsigned char")
MOVE FUNCTION "foo" in FILE "stuff.cc" TO "foo_funcs.cc";

Related

How long should it take for someone to be able to type code from memory? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I understand that this question could be answered with a simple sentence and that it may be viewed as subjective, however, I am a young student who is interested in pursuing a career in programming and wondered how long it took some of you to get to the level of experience you are now?.
I ask this because I am currently working on building an application in Java on the Android platform and it bothers me that I am constantly having to look up how to write a certain section of code in my application such as writing to a database, or how the if statement should be structured.
My question really is, how long did it take for you to become experienced enough to actually know exactly how your next line of code was going to look, before you even wrote it?
The speed at which you can quickly recall language syntax, common library functions, and best practice patterns is directly proportional to the amount of time you spend using them.
In other words you will find yourself getting faster the more you do it.
I have been a C++ programmer for the last 20 years. It has taken me that long to get to this expertise level. I'm mostly a Windows programmer, and I keep the msdn website up on one of my monitors all the time.
Doesn't matter how long you've been doing it. You will never know everything from memory. Don't sweat it.
I've been programing for almost half of my life and I sill can't always recall simple syntax, let alone entire tracks of code for more complex tasks. If you ask me that's what reference books and Google are for.
A far more important skill to have is the general knowledge of programming in any language, i.e. recursion, looping, object oriented design, working with APIs, error handling etc... Once you have all that down, you can apply it to any language and platform.
I can tell you that after 25 years there are lines of code that I don't know how they're exactly going to look like.
Want an example? I'm programming in Java since last century and I can honestly still make a mistake if I were to type hashcode() or hashCode().
Why? Because actually typing such a method name yourself is so last century. Your intention is to override Object's hashCode() method, so you use programming by intention.
You hit Ctrl-O then h and you get a list of the methods starting with an 'h' that you can override. Then you hit enter. As a bonus, the "#Override" gets inserted for you too.
4 keys. 4, to get this:
#Override
public int hashCode() {
}
And honestly, whether hashCode takes an uppercase 'c' or not... I couldn't care less. This is not what an hashcode is about and my intention is not to know all the inconsistencies languages and API designers came up with. My intention is to override the method that gives back an object's hashcode and my (modern) IDE allows me to get that skeleton in four keypresses, including hitting enter.
Another example: there are people who do really type this countless times a day:
for (int i = 0; i < ; i++) {
}
or the more tricky:
for (int i = ; i >= 0; i--) {
}
Note in that latter case I can still mess up and type "i++" instead of "i--" (a 'thinko' as its called).
But I don't care at all, because I type "fi<tab>" (three keys) and I get the first one or "fir<tab>" (four keys, "for i (in) reverse") and I get the second one. You ain't beating that (especially seen that I'm a touch typist so I type these three or four keys fast). In addition to speed, as an added bonus the autocompletion won't mess you "i--".
In many case I don't know exactly the line I'll get: sure, I know it "more or less" and that's exactly the way it should be.
Don't sweat it too much. As others have mentioned it eventually gets easier to write code without looking things up so much as you work with a particular language over time.
HOWEVER
There are a few reasons that even veteran programmers find themselves constantly using reference material:
(1) Unlike days of yore, most projects now require you to use a number of languages to get the job done. For example single web site-project a web-site may require C#, XML, JavaScript, SQL, HTML, XHTML, RegEx and CSS all in the same project. Switching between some of these languages can really throw you for a loop sometimes because many of them are just similar enough to be familiar, but just different enough to make you forget the subtle differences in syntax.
(2) Just when you start getting comfortable that you know a language inside and out, the vendor will release a new update that changes everything you knew about it. For example ASP->ASP.NET.
I still look up simple things fairly frequently and I've been at this almost 20 years. The important thing is that you understand the underlying concepts and principles.
It took me 12 years to get where I am at today, which is my experience in professional programming. You will always improve when working with some programming language, even if you have been working with it for the past 5 years.
About your question, it depends. I think that you should be comfortable with the syntax after a week, comfortable with the main libraries after a month, and comfortable with the platform after 6 months.
But when you get there, don't stop!
If you code every day with the same language, you'll probably have the common language elements and patterns memorized in a month or two. But there are plenty of things that you'll probably never memorize, simply because you don't use them often enough, and also because modern IDE's can help you so much that you don't really need to remember everything if you utilize all their features (like code snippets, shortcuts, intellisense).
I've been coding for 15 years, and doing C# for the last three, and I still use the MSDN reference material every day. However, as far as the basic building blocks of the language are concerned, I had them memorized in the first month or two.
Also, the more often you code, the better you'll commit it to memory.
There's a false assumption going on here I think...
At my job I end up working all over the stack in different languages and platforms. If I'm away from a project for 6 months I end up having to look at code to get even basic things done. The advantage of experience is reducing the re-ramp up time on productivity though.
So, instead of it taking weeks or months to get back to a point where I can write 80% of the code from memory it takes a few or several days (if that sometimes). I've been programming for about 5 years now. I'm just now getting to the point of being able to visualize small applications in their entirety.
As long as you're working on solving a problem that you haven't already solved (numerous times) you'll probably always have to look up code.
If it can be done I'm guessing it takes longer than 5 years for most people, unless they work in one language with one editor and in one area. (ex: C#, Visual Studio, file system operations) My company isn't big enough to employee someone that specific.
Don't be downhearted by having to consult documentation all the time man, that's what it's there for. Over time you get used to syntax and things like that but don't sweat it if you can't remember library methods or ways to connect to a DB.
Over time (with experience) you might remember these off the top of your head but in reality there's nothing wrong with taking a quick consultation of the documentation to refresh the memory.
Also remember that technology is ever changing so it's good to keep the mind fresh with new ways of thinking/ways to do things.
Question is not 100% clear. One of the best programmers I know doesn't remember anything and needs to look up printf formatting strings almost daily. On the other hand if you are having hard time figuring out how to write that for (int i = 0; i < len; i++) loop after doing this for 6 months -- that doesn't sound right.
the idea that one could every bit of code from even a single language and then type plainly from memory is pretty far out. the amount of pre-defined functions for, say PHP or Java alone is immense.
but that being said, its important to learn the programming structures, and know them the best you can. structures like foreach, if then else, switch, etc. are really the things that need to be integrated thoroughly. also, conceptual things like Object Orientation (not just "using" objects like mysqli, but understanding things like controlling code, client code, bottom-up and top-down architecture) are the real things that make good programmers great. for myself, i know that i have not the capacity to learn every defined function thats provided by language writers so i instead learn whether or not it can be done(and of course still try on occasion to do things that cant be done, lol). if you know that, then its a matter of google and books to find the "specific" mechanisms on how.
cheers my friend.
Not an answer per se, but I just wanted to say that as a novice myself, this q&a is one of the most useful things I've read on SO. It seems that yes, experts can probably code the basic stuff from memory, but even they revert to book for complex problems, and for beginners, that's what the book is for.
I feel like I'm just beginning
You should use code snippet if you are using certain piece of code repeatedly. I doesn't bother me if I cannot remember some piece of code from memory.
For me, it depends heavily on what I'm writing. For example, I doubt that most people ever quite memorize all the parameters to some Windows functions. I may know that I need to call CreateFile on the next line, but don't know all the parameters in order until they're typed in (with help from Intellisense, and sometimes MSDN).
With something that's doing simple computation, I'm a lot more likely to be limited by my typing speed (but I'm a fairly poor typist, so thinking faster than I can type doesn't take much).
That means it's really a question of how much of the time you need to refer to something to type the next line of code -- at first, it's a pretty small percentage, and over time it grows. I doubt it ever gets to 100% for anybody though. I, for one, don't think I'd want it to -- that would indicate I wasn't working on new and different things...
If you use eclipse and java, you may find yourself already there.
Other combinations may be a little slower to a lot slower.
Java has the advantage that it's pretty easy for the environment to build an entire parse tree while you are typing. At any place in your code, typing ctrl-space can give you an entire list of valid options.
Also syntax errors are always underlined.
If you want to go hard core though, I started in C before the day of decent editors and it took me about a year before I typed in more than a few lines and didn't get a compile error.
I don't know about memorization. Repetition is mother of all learning and that applies to all aspects of life. Look at the experienced accountant vs. the novice when filing taxes, who looks up stuff more. But what I did discover recently is that I navigate documentations much quicker and have a sense for going directly to where my question is answered. I got 6h sense - I can see the code! Seriously, it all comes with experience. Still, when learning something completely new, there's no shame in looking up how to do certain things. That's what separates humans right, learning from others. The more you work on something, the better you become.
There is absolutely nothing wrong with having to lookup the documentation every time you want to write some code. I got lucky in that once I use a certain function, I don't forget it very easily. However, most of the time, especially when I'm coding in a language that I haven't used in a while,
I start out by writing a flowchart of the algorithm that I want to code - just the pure logic. The most important reason for doing this is so that I don't lose my train of thought and forget the algorithm that solves my problem in the midst of technical problems like syntax and < what library functions exist? >
Then I look at the documentation and check to see if there are simple function calls that will help me accomplish each task in my algorithm
If such functions do not exist, then I either modify my algorithm to accommodate for what functions the language does provide or write helper functions do fill in the gaps.
I only start coding NOW, which is not too difficult to do anymore, because I already have all the relevant functions written down. So now it's just a question of translating pure logic into syntactically accurate code.
Proper syntax usually does not elude me, but if that does happen (VERY rare), Google provides very nice code snippets if you ask nicely.
Hope this helps
May the Force be with you
I'm programming Java now for about 5 years, and I never have had any trouble remembering syntax. I can <brag>write almost all java.util.*, java.io.*, java.lang.* and javax.swing.* stuff out of my memory</brag>, but does it help me? Not very much. It doesn't make me a better programmer than someone who can't!
I'm using Netbeans, which greatly helps working with libraries. Also, the documentation, just in the place where you need it. Sometimes, it's quite unnecessary, but sometimes you'd wish the "auto complete" screen would popup faster!
The best thing as a student is to concentrate on what you are doing, not how fast you are doing it. Looking up things isn't bad; as it'll help your so-called "unconsciousness mind" process what you are really trying to accomplish. Having such breaks, e.g. by looking up a certain documentation or syntax reference, may even let you be better at programming (no proof for this, though).
Question is quite subjective.
With the great many IDE's available and the "newbie" tutorials on getting started, it won't take you long before you're off writing your own apps.
That said, unless you have a "thirst" for how stuff works kinda attitude all the time towards everything, you won't go far. In this field, you really have to have a passion for what you do to be great.
... It bothers me that I am continually having to look things up... How long did it take you to get to the level where you are now?
For me, and for most of the students I teach, the answer depends on two variables:
How many lines of code have I written?
Do I use the language or library every day?
(Reading other people's code is very helpful for learning a language and learning how to think in a language, but for me at least, it hasn't helped me become a fluent writer of programs in a language. Only writing code does that.) So my first comment is that time should be measured in lines of code written, not hours or years.
(Ray Bradbury once advised aspiring writers of fiction to write a thousand words a day six days a week, and after they've written a million words they might start to know something about their craft. This is good advice for programmers too.)
As for my own experience, across a half dozen languages that I currently know well or once knew well, it's been pretty consistent for me that
After writing 100 lines I am continually looking things up in the manual and don't really know what I am doing.
After writing 1,000 lines I use the manual occasionally and am starting to learn how to think in the language.
After writing 10,000 lines I am about as good as I'm going to get without making special efforts.
After writing 25,000 lines I probably will not need the manual again.
It's also true that
To learn to write 100 lines I had to read 100 to 1,000 lines that someone else wrote.
For the first 1,000 lines I write it is good for me to read 2,000 lines someone else wrote. For the next 1,000 lines it is good for me to read 1,000 lines someone else wrote.
After I've written 5,000 lines I learn the most by reading code written by world experts or by people who designed the language and understand what is there. I no longer have much to learn by reading just any program.
On the other hand, my experience about when I stop having to refer continually to the documentation is much less consistent.
I find it especially hard when two languages are very similar; I will never stop needing the ksh manual to tell me what is different from sh or bash, and I will never stop needing the Haskell manual to tell me what is different from Standard ML (though the need grows less with each additional 1,000 lines of Haskell that I write). I also find it interesting that while I have written over 35,000 lines of Lua code, and I will never need the manual again for a language question, I have to look up libraries and API functions almost every time I write something longer than 500 lines. (I've written a lot of short Lua programs and a couple of long ones, and I don't use the language every day, although I definitely use it at least several days each month.)
As for the unspoken question, when are you personally going to get better?, take some advice from Watts Humphrey: measure your own performance and track it over time. I think if each day you count the number of times you had to stop and look things up, and graph that against number of lines of code written or edited (which you can get from source-code control), you will be pleasantly surprised by objective evidence of improvement. And I think once you have such evidence, you will be able to focus more on continuing to improve, and not so much on where you are now or where you hope to be in a year.
It's true that after some years of programming you'll be able to remember a lot of thing without having to check the "manual". For me this is not an important milestone in your programming life though, the really important moment is when you reach the point where you don't know how to do something... but you're sure that can be done and you know where and how to research about it :-)
You made a very important step participating on this site. Exchanging ideas and helping each other it's a excellent way of learning.
Sociologist Malcolm Gladwell believes that ten thousand hours is a good benchmark for the amount of practice required to become world class at many fields of endeavour. I think that sort of number applies to programming as well. This isn't quite what you asked, though; being able to code competently certainly requires familiarity with your environment (language constructs, system libraries, third party libraries and perhaps something of the concepts underpinning them), but there are many soft skills involved which are harder to describe and can only really be acquired through practice.
As others have said, being good at programming is not about typing code from memory; it's about recognising patterns, understanding systems, solving problems. It's about choosing the right tools for the job (languages, libraries, algorithms, whatever) and being able to make proper use of them.
In all the jobs I've had, it's about adaptibility and flexibility; you might have to learn a new language or pick up somebody else's poorly documented code tomorrow, and a good programmer will be able to take this in their stride.
I've been coding professionally for nearly 10 years now; there's all sorts of code I use semi-regularly which I look up the options for at least some of the time. There are too many commands with too many options in too many languages for me to remember each and every last one in detail and Google is quite good enough at getting the information I want.
That said - there are some bits of routine code which I use all the time but can count on one hand the number of times I've written - the exact syntax for populating a dataset in .Net for example. One of the skills I've most come to value over this time and which saves me the most time is spotting when some code can be quickly and easily moved into utility libraries. If it's fiddly but routine, consider this approach to save yourself hassle and improve your overall code quality.
In the context of this question ; java, c++, javascript the languages are still evolving. I can't say about other languages.
The language standard/specification changes over time
Libraries are added to supplement the language constructs e.g Boost, Google Collections, Apache Commons, jQuery
Applications will rarely be bound to a single aspect of a language
Across organizations/projects, coding standards change
A project I worked upon recommended against using primitives
When unfamiliar with the constructs used, I put in pseudo-code flagged with //TODO first .. then go in and find the actual API to use.
IMO, the answer to your question is - there is no definite answer.
As a Java programmer the sheer size of the runtime library makes it impossible to remember everything. Swing is big, there is an XSLT engine (which contains TWO languages), the Concurrent support evolves and grows.
The direct access to the Javadoc API from within Eclipse combined with code completion makes it possible to find the information you cannot remember but you know is there, quickly and efficiently.
I have found the javaalmanac.com (which has been reworked into the more convoluted http://www.exampledepot.com/egs/index.html) invaluable in presenting short, concise and above all correct programs doing just one small thing. Strongly recommended.
Most weeks I program in Java, C#, Python, PHP, JavaScript, SQL, Smarty, Django Templating and occasionally C++ and Objective C. I'm a student so this is partially school work and partially my part-time job. Instead of learning syntax I've learned what concepts to look for.
Seeing patterns and concepts is key, once you know the concepts and what to look for the syntax is secondary.
I find that even when I am just being exposed to a language I can accomplish a lot just by looking for 'what ought to be there'
Why should you be able to remember all of this stuff? Personally I embrace the fact that I can't remember all of this stuff and simply try and remember where to find the information that I need. I find that much more useful. This takes the form of blogging, taking notes, keeping large amounts of 'sample' code and reusable libraries and writing about code that I find useful and interesting; oh and lots of books, some of which I hardly ever 'need' and some of which I hardly ever really read but they've been skimmed and I know where they live.
Technologies come and go and there's just no way I could have kept all of the things that I may one day find useful in my head; so I page them out and just keep the index in memory... For example; 9 years ago I was doing some stuff with Java and CORBA and whatever. There's no way that I could drop back into that now without the notes that I wrote up for my website back when I was doing it: http://www.lenholgate.com/blog/2001/02/corba---enumeration.html. Likewise I have code that I use on a daily basis that has been kicking around since 1997 or earlier. I don't remember how to type it in, I have it in a file with tests (if I'm lucky) and docs (if I'm even luckier).
Whilst I realise that most of what I'm talking about is 'big stuff' I also often have to go and look at some of my old code to simply work out how to structure a typedef...
Of course the day to day stuff will come with time and practice; but you need to work out that you have to page some of it out in a form that you can reload later very quickly. Embrace the fact that your memory is never going to be able to hold it all and outsource it :)

What programming practice that you once liked have you since changed your mind about? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
As we program, we all develop practices and patterns that we use and rely on. However, over time, as our understanding, maturity, and even technology usage changes, we come to realize that some practices that we once thought were great are not (or no longer apply).
An example of a practice I once used quite often, but have in recent years changed, is the use of the Singleton object pattern.
Through my own experience and long debates with colleagues, I've come to realize that singletons are not always desirable - they can make testing more difficult (by inhibiting techniques like mocking) and can create undesirable coupling between parts of a system. Instead, I now use object factories (typically with a IoC container) that hide the nature and existence of singletons from parts of the system that don't care - or need to know. Instead, they rely on a factory (or service locator) to acquire access to such objects.
My questions to the community, in the spirit of self-improvement, are:
What programming patterns or practices have you reconsidered recently, and now try to avoid?
What did you decide to replace them with?
//Coming out of university, we were taught to ensure we always had an abundance
//of commenting around our code. But applying that to the real world, made it
//clear that over-commenting not only has the potential to confuse/complicate
//things but can make the code hard to follow. Now I spend more time on
//improving the simplicity and readability of the code and inserting fewer yet
//relevant comments, instead of spending that time writing overly-descriptive
//commentaries all throughout the code.
Single return points.
I once preferred a single return point for each method, because with that I could ensure that any cleanup needed by the routine was not overlooked.
Since then, I've moved to much smaller routines - so the likelihood of overlooking cleanup is reduced and in fact the need for cleanup is reduced - and find that early returns reduce the apparent complexity (the nesting level) of the code. Artifacts of the single return point - keeping "result" variables around, keeping flag variables, conditional clauses for not-already-done situations - make the code appear much more complex than it actually is, make it harder to read and maintain. Early exits, and smaller methods, are the way to go.
Trying to code things perfectly on the first try.
Trying to create perfect OO model before coding.
Designing everything for flexibility and future improvements.
In one word overengineering.
Hungarian notation (both Forms and Systems).
I used to prefix everything. strSomeString or txtFoo.
Now I use someString and textBoxFoo. It's far more readable and easier for someone new to come along and pick up. As an added bonus, it's trivial to keep it consistant -- camelCase the control and append a useful/descriptive name. Forms Hungarian has the drawback of not always being consistent and Systems Hungarian doesn't really gain you much. Chunking all your variables together isn't really that useful -- especially with modern IDE's.
The "perfect" architecture
I came up with THE architecture a couple of years ago. Pushed myself technically as far as I could so there were 100% loosely coupled layers, extensive use of delegates, and lightweight objects. It was technical heaven.
And it was crap. The technical purity of the architecture just slowed my dev team down aiming for perfection over results and I almost achieved complete failure.
We now have much simpler less technically perfect architecture and our delivery rate has skyrocketed.
The use of caffine. It once kept me awake and in a glorious programming mood, where the code flew from my fingers with feverous fluidity. Now it does nothing, and if I don't have it I get a headache.
Commenting out code. I used to think that code was precious and that you can't just delete those beautiful gems that you crafted. I now delete any commented-out code I come across unless there's a TODO or NOTE attached because it's too perilous to leave it in. To wit, I've come across old classes with huge commented-out portions and it really confused me why they were there: were they recently commented out? is this a dev environment change? why does it do this unrelated block?
Seriously consider not commenting out code and just deleting it instead. If you need it, it's still in source control. YAGNI though.
The overuse / abuse of #region directives. It's just a little thing, but in C#, I previously would use #region directives all over the place, to organize my classes. For example, I'd group all class properties together in a region.
Now I look back at old code and mostly just get annoyed by them. I don't think it really makes things clearer most of the time, and sometimes they just plain slow you down.
So I have now changed my mind and feel that well laid out classes are mostly cleaner without region directives.
Waterfall development in general, and in specific, the practice of writing complete and comprehensive functional and design specifications that are somehow expected to be canonical and then expecting an implementation of those to be correct and acceptable. I've seen it replaced with Scrum, and good riddance to it, I say. The simple fact is that the changing nature of customer needs and desires makes any fixed specification effectively useless; the only way to really properly approach the problem is with an iterative approach. Not that Scrum is a silver bullet, of course; I've seen it misused and abused many, many times. But it beats waterfall.
Never crashing.
It seems like such a good idea, doesn't it? Users don't like programs that crash, so let's write programs that don't crash, and users should like the program, right? That's how I started out.
Nowadays, I'm more inclined to think that if it doesn't work, it shouldn't pretend it's working. Fail as soon as you can, with a good error message. If you don't, your program is going to crash even harder just a few instructions later, but with some nondescript null-pointer error that'll take you an hour to debug.
My favorite "don't crash" pattern is this:
public User readUserFromDb(int id){
User u = null;
try {
ResultSet rs = connection.execute("SELECT * FROM user WHERE id = " + id);
if (rs.moveNext()){
u = new User();
u.setFirstName(rs.get("fname"));
u.setSurname(rs.get("sname"));
// etc
}
} catch (Exception e) {
log.info(e);
}
if (u == null){
u = new User();
u.setFirstName("error communicating with database");
u.setSurname("error communicating with database");
// etc
}
u.setId(id);
return u;
}
Now, instead of asking your users to copy/paste the error message and sending it to you, you'll have to dive into the logs trying to find the log entry. (And since they entered an invalid user ID, there'll be no log entry.)
I thought it made sense to apply design patterns whenever I recognised them.
Little did I know that I was actually copying styles from foreign programming languages, while the language I was working with allowed for far more elegant or easier solutions.
Using multiple (very) different languages opened my eyes and made me realise that I don't have to mis-apply other people's solutions to problems that aren't mine. Now I shudder when I see the factory pattern applied in a language like Ruby.
Obsessive testing. I used to be a rabid proponent of test-first development. For some projects it makes a lot of sense, but I've come to realize that it is not only unfeasible, but rather detrimental to many projects to slavishly adhere to a doctrine of writing unit tests for every single piece of functionality.
Really, slavishly adhering to anything can be detrimental.
This is a small thing, but: Caring about where the braces go (on the same line or next line?), suggested maximum line lengths of code, naming conventions for variables, and other elements of style. I've found that everyone seems to care more about this than I do, so I just go with the flow of whoever I'm working with nowadays.
Edit: The exception to this being, of course, when I'm the one who cares the most (or is the one in a position to set the style for a group). In that case, I do what I want!
(Note that this is not the same as having no consistent style. I think a consistent style in a codebase is very important for readability.)
Perhaps the most important "programming practice" I have since changed my mind about, is the idea that my code is better than everyone else's. This is common for programmers (especially newbies).
Utility libraries. I used to carry around an assembly with a variety of helper methods and classes with the theory that I could use them somewhere else someday.
In reality, I just created a huge namespace with a lot of poorly organized bits of functionality.
Now, I just leave them in the project I created them in. In all probability I'm not going to need it, and if I do, I can always refactor them into something reusable later. Sometimes I will flag them with a //TODO for possible extraction into a common assembly.
Designing more than I coded.
After a while, it turns into analysis paralysis.
The use of a DataSet to perform business logic. This binds the code too tightly to the database, also the DataSet is usually created from SQL which makes things even more fragile. If the SQL or the Database changes it tends to trickle to everything the DataSet touches.
Performing any business logic inside an object constructor. With inheritance and the ability to create overloaded constructors tend to make maintenance difficult.
Abbreviating variable/method/table/... Names
I used to do this all of the time, even when working in languages with no enforced limits on lengths of names (well they were probably 255 or something). One of the side-effects were a lot of comments littered throughout the code explaining the (non-standard) abbreviations. And of course, if the names were changed for any reason...
Now I much prefer to call things what they really are, with good descriptive names. including standard abbreviations only. No need to include useless comments, and the code is far more readable and understandable.
Wrapping existing Data Access components, like the Enterprise Library, with a custom layer of helper methods.
It doesn't make anybody's life easier
Its more code that can have bugs in it
A lot of people know how to use the EntLib data access components. No one but the local team knows how to use the in house data access solution
I first heard about object-oriented programming while reading about Smalltalk in 1984, but I didn't have access to an o-o language until I used the cfront C++ compiler in 1992. I finally got to use Smalltalk in 1995. I had eagerly anticipated o-o technology, and bought into the idea that it would save software development.
Now, I just see o-o as one technique that has some advantages, but it's just one tool in the toolbox. I do most of my work in Python, and I often write standalone functions that are not class members, and I often collect groups of data in tuples or lists where in the past I would have created a class. I still create classes when the data structure is complicated, or I need behavior associated with the data, but I tend to resist it.
I'm actually interested in doing some work in Clojure when I get the time, which doesn't provide o-o facilities, although it can use Java objects if I understand correctly. I'm not ready to say anything like o-o is dead, but personally I'm not the fan I used to be.
In C#, using _notation for private members. I now think it's ugly.
I then changed to this.notation for private members, but found I was inconsistent in using it, so I dropped that too.
I stopped going by the university recommended method of design before implementation. Working in a chaotic and complex system has forced me to change attitude.
Of course I still do code research, especially when I'm about to touch code I've never touched before, but normally I try to focus on as small implementations as possible to get something going first. This is the primary goal. Then gradually refine the logic and let the design just appear by itself. Programming is an iterative process and works very well with an agile approach and with lots of refactoring.
The code will not look at all what you first thought it would look like. Happens every time :)
I used to be big into design-by-contract. This meant putting a lot of error checking at the beginning of all my functions. Contracts are still important, from the perspective of separation of concerns, but rather than try to enforce what my code shouldn't do, I try to use unit tests to verify what it does do.
I would use static's in a lot of methods/classes as it was more concise. When I started writing tests that practice changed very quickly.
Checked Exceptions
An amazing idea on paper - defines the contract clearly, no room for mistake or forgetting to check for some exception condition. I was sold when I first heard about it.
Of course, it turned to be such a mess in practice. To the point of having libraries today like Spring JDBC, which has hiding legacy checked exceptions as one of its main features.
That anything worthwhile was only coded in one particular language. In my case I believed that C was the best language ever and I never had any reason to code anything in any other language... ever.
I have since come to appreciate many different languages and the benefits/functionality they offer. If I want to code something small - quickly - I would use Python. If I want to work on a large project I would code in C++ or C#. If I want to develop a brain tumour I would code in Perl.
When I needed to do some refactoring, I thought it was faster and cleaner to start straightaway and implement the new design, fixing up the connections until they work. Then I realized it's better to do a series of small refactorings to slowly but reliably progress towards the new design.
Perhaps the biggest thing that has changed in my coding practices, as well as in others, is the acceptance of outside classes and libraries downloaded from the internet as the basis for behaviors and functionality in applications. In school at the time I attended college we were encouraged to figure out how to make things better via our own code and rely upon the language to solve our problems. With the advances in all aspects of user interface and service/data consumption this is no longer a realistic notion.
There are certain things which will never change in a language, and having a library that wraps this code in a simpler transaction and in fewer lines of code that I have to write is a blessing. Connecting to a database will always be the same. Selecting an element within the DOM will not change. Sending an email via a server-side script will never change. Having to write this time and again wastes time that I could be using to improve my core logic in the application.
Initializing all class members.
I used to explicitly initialize every class member with something, usually NULL. I have come to realize that this:
normally means that every variable is initialized twice before ever being read
is silly because in most languages automatically initialize variables to NULL.
actually enforces a slight performance hit in most languages
can bloat code on larger projects
Like you, I also have embraced IoC patterns in reducing coupling between various components of my apps. It makes maintenance and parts-swapping much simpler, as long as I can keep each component as independent as possible. I'm also utilizing more object-relational frameworks such as NHibernate to simplify database management chores.
In a nutshell, I'm using "mini" frameworks to aid in building software more quickly and efficiently. These mini-frameworks save lots of time, and if done right can make an application super simple to maintain down the road. Plug 'n Play for the win!

What makes code legacy?

I have heard many developers refer to code as "legacy". Most of the time it is code that has been written by someone who no longer works on the project. What is it that makes code, legacy code?
Update in response to:
"Something handed down from an ancestor or a predecessor or from the past" http://www.thefreedictionary.com/legacy. Clearly you wanted to know something else. Could you clarify or expand your question? S.Lott
I am looking for the symptoms of legacy code that make it unusable or a nightmare to work with. When is it better to throw it away? It is my opinion that code should be thrown away more often and that reinventing the wheel is valuable part of development. The academic ideal of not reinventing the wheel is a nice one but it is not very practical.
On the other hand there is obviously legacy code worth keeping.
By using hardware, software, APIs, languages, technologies or features that are either no longer supported or have been superceded, typically combined with little to no possibility of ever replacing that code, instead using it til it or the system dies.
What is it that makes code, legacy code?
As with plain legacy, when the author is dead or missing, you as a heir get all or some of his code.
You shed some tears and try to figure out what to do with all this rubbish.
Michael Feathers has an interesting definition in his book Working Effectively with Legacy Code. According to him legacy code is code without automated tests.
It is a very general (and oft abused term) but any of the following would be legitimate reasons to call an app legacy:
The code base is based on a language/platform which is entirely unsupported by the manufacturer of the original product (often said manufacturer has gone out of business).
(really 1a) The code base or platform on which it is built is so old that getting qualified or experienced developers for the system is both hard and expensive.
The application supports some aspect of the business which is no longer actively grown and for which alterations are extremely rare, normally to fix it if something entirely unexpected changes around it (the canonical example being the Y2K issue) or if some regulation/external pressure forces it. Since both reasons are pressing and normally unavoidable but no significant development has occurred on the project it is likely that those people assigned to deal with this will be unfamiliar with the system (and it's accumulated behaviours and intricacies). In these cases this would often be reason to increase the perceived and planned for risk associated with the project.
The system has/or is being replaced with another. As such the system may be used for much less than originally intended, or perhaps only as a means of viewing historical data.
Legacy generally refers to code that is no longer being developed - meaning that if you use it, you have to use it on its original terms - you cannot just edit it to support the way the world looks today. For example, legacy code has to run on hardware that may not exist today - or is no longer supported.
According to Michael Feathers, the author of the excellent Working Effectively with Legacy Code, legacy code is a code which has no tests. When there is no way to know what breaks when this code changes.
The main thing that distinguishes
legacy code from non-legacy code is
tests, or rather a lack of tests. We
can get a sense of this with a little
thought experiment: how easy would it
be to modify your code base if it
could bite back, if it could tell you
when you made a mistake? It would be
pretty easy, wouldn't it? Most of the
fear involved in making changes to
large code bases is fear of
introducing subtle bugs; fear of
changing things inadvertently. With
tests, you can make things better with
impunity. To me, the difference is so
critical, it overwhelms any other
distinction. With tests, you can make
things better. Without them, you just
don’t know whether things are getting
better or worse.
Nobody is gonna read this, but I feel the other answers don't get it quite right:
It has value, if it wasn't useful it would've been thrown away long ago
Its hard to reason about because either of
Lack of documentation,
Original author cannot be found or forgot (yes 2 months later your code can be legacy code too!!),
Lack of tests or typesystem
Doesn't follow modern practices (ie no context to hold on too)
There is a requirement to change or extend it.
If there isn't a requirement to change it, it isn't legacy code
since nobody cares about it. It does its thing and there is nobody
around to call it legacy code.
A colleague once told me that legacy code was any code that you hadn't written yourself.
Arguably, it's just a pejorative term for code that we don't like any more for whatever reason (typically because it's not cool or fashionable but it works).
The TDD brigade might suggest that any code without tests is legacy code.
Legacy code is source code that relates to a no-longer supported or manufactured operating system or other computer technology.
http://en.wikipedia.org/wiki/Legacy_code
"Legacy code is source code that relates to a no-longer supported or manufactured "
Any code with support (or documentation) missing. Be it:
inline comments
technical documentation
spoken documentation (the person who wrote it)
unit tests documenting the workings of the code
For me legacy code is code that was written prior to some paradigm shift.
It may still be very much in use but it is in the process of being refactored to bring it into line.
e.g. Old procedural code hanging around in an otherwise OO system.
Code (or anything else, really) becomes "legacy" when it has been replaced by something newer/better, and yet despite this it's still used and kept alive "in the wild".
Preserving legacy code is not so much an academic ideal as it is keeping code that works, no matter how poorly. In many conservative enterprise situations, that would be considered more practical than throwing it away and starting again from scratch. Better the devil you know...
Legacy code is code that is painful/expensive to keep current with changing requirements.
There are two ways that this can happen:
The code is unsuitable for change
The semantics of the code have been swapped out to silicon
1) is the easier of the two to recognize. It is software that has fundamental limits making it unable to keep up with the ecosystem around it. For example, a system built around O(n^2) algorithm won't scale beyond a certain point and must be re-written if requirements move in that direction. Another example is code using libraries that are not supported on the latest OS versions.
2) Is harder to recognize, but all code of this kind shares the characteristic that people are afraid to change it. This could be because it was badly written/documented to begin with, because it is untested, or because it is non-trivial and the original authors who understood it left the team.
The ASCII/Unicode chars that comprise living code have semantic meaning, the "why's", "what's" and to some degree the "how's", in the minds of people associated with it. Legacy code is either un-owned or the owners do not have meaning associated with large portions of it. Once this happens (and it could happen the next day with really poorly-written code), to change this code, someone must learn it and understand it. This process is a significant fraction of the time it takes to write it in the first place.
The day you're afraid to refactor your code is the day when your code has become legacy.
I consider code "legacy" if any or all of the following conditions apply:
It was written using a language or methodology that is a generation behind current standards
The code is a complete mess with no planning or design behind it
It is written in outdated languages and in an outdated, non object-oriented style
It is difficult to find developers who know the language because it is so old
Unlike some of the other opinions here, I've seen plenty of modern applications that work decently without unit tests. Unit testing still has not caught on with everyone. Perhaps ten years from now the next generation of programmers will look at our current applications and consider them "legacy" for not containing unit tests, just as I consider non object-oriented applications to be legacy.
If few changes need to be made to a legacy codebase, it's better to simply leave it as-is and go with the flow. If the application needs drastic functionality changes, a GUI overhaul, and/or you can't find anyone who knows the programming language, it's time to throw away and start over. A word of warning, however: rewriting from scratch can be very time-consuming, and it's difficult to know if you've replicated all functionality. You'll probably want to have test cases and unit tests written for the legacy application and the new application.
Quite honestly legacy code is any code, framework, api, of other software construct thta's not "cool" anymore. For example COBOL is unanimously regarded as legacy while APL is not. Now one can also make the case that COBOL is consideed legacy and APL not because it has about 1m times the install base as APL. However, if you say that you need to work on APL code the reply would not be "oh no, that legacy stuff" but rather "oh my god, guess you won't be doing anything for the next century" see the difference?
This is a general term thrown around quite often (and quite generically) in the software ecosystem.
Well, I like to think of legacy code as inherited code. This is simply code that was written in the past. In most cases, legacy code do not follow new/current practices and is often considered archaic.
Legacy code is anything written more than a month ago :-)
It's often any code that isn't written in the trendy scripting language du jour, and I'm only half joking.

What's the best way to become familiar with a large codebase? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Joining an existing team with a large codebase already in place can be daunting. What's the best approach;
Broad; try to get a general overview of how everything links together, from the code
Narrow; focus on small sections of code at a time, understanding how they work fully
Pick a feature to develop and learn as you go along
Try to gain insight from class diagrams and uml, if available (and up to date)
Something else entirely?
I'm working on what is currently an approx 20k line C++ app & library (Edit: small in the grand scheme of things!). In industry I imagine you'd get an introduction by an experienced programmer. However if this is not the case, what can you do to start adding value as quickly as possible?
--
Summary of answers:
Step through code in debug mode to see how it works
Pair up with someone more familiar with the code base than you, taking turns to be the person coding and the person watching/discussing. Rotate partners amongst team members so knowledge gets spread around.
Write unit tests. Start with an assertion of how you think code will work. If it turns out as you expected, you've probably understood the code. If not, you've got a puzzle to solve and or an enquiry to make. (Thanks Donal, this is a great answer)
Go through existing unit tests for functional code, in a similar fashion to above
Read UML, Doxygen generated class diagrams and other documentation to get a broad feel of the code.
Make small edits or bug fixes, then gradually build up
Keep notes, and don't jump in and start developing; it's more valuable to spend time understanding than to generate messy or inappropriate code.
this post is a partial duplicate of the-best-way-to-familiarize-yourself-with-an-inherited-codebase
Start with some small task if possible, debug the code around your problem.
Stepping through code in debug mode is the easiest way to learn how something works.
Another option is to write tests for the features you're interested in. Setting up the test harness is a good way of establishing what dependencies the system has and where its state resides. Each test starts with an assertion about the way you think the system should work. If it turns out to work that way, you've achieved something and you've got some working sample code to reproduce it. If it doesn't work that way, you've got a puzzle to solve and a line of enquiry to follow.
One thing that I usually suggest to people that has not yet been mentioned is that it is important to become a competent user of the existing code base before you can be a developer. When new developers come into our large software project, I suggest that they spend time becoming expert users before diving in trying to work on the code.
Maybe that's obvious, but I have seen a lot of people try to jump into the code too quickly because they are eager to start making progress.
This is quite dependent on what sort of learner and what sort of programmer you are, but:
Broad first - you need an idea of scope and size. This might include skimming docs/uml if they're good. If it's a long term project and you're going to need a full understanding of everything, I might actually read the docs properly. Again, if they're good.
Narrow - pick something manageable and try to understand it. Get a "taste" for the code.
Pick a feature - possibly a different one to the one you just looked at if you're feeling confident, and start making some small changes.
Iterate - assess how well things have gone and see if you could benefit from repeating an early step in more depth.
Pairing with strict rotation.
If possible, while going through the documentation/codebase, try to employ pairing with strict rotation. Meaning, two of you sit together for a fixed period of time (say, a 2 hour session), then you switch pairs, one person will continue working on that task while the other moves to another task with another partner.
In pairs you'll both pick up a piece of knowledge, which can then be fed to other members of the team when the rotation occurs. What's good about this also, is that when a new pair is brought together, the one who worked on the task (in this case, investigating the code) can then summarise and explain the concepts in a more easily understood way. As time progresses everyone should be at a similar level of understanding, and hopefully avoid the "Oh, only John knows that bit of the code" syndrome.
From what I can tell about your scenario, you have a good number for this (3 pairs), however, if you're distributed, or not working to the same timescale, it's unlikely to be possible.
I would suggest running Doxygen on it to get an up-to-date class diagram, then going broad-in for a while. This gives you a quickie big picture that you can use as you get up close and dirty with the code.
I agree that it depends entirely on what type of learner you are. Having said that, I've been at two companies which had very large code-bases to begin with. Typically, I work like this:
If possible, before looking at any of the functional code, I go through unit tests that are already written. These can generally help out quite a lot. If they aren't available, then I do the following.
First, I largely ignore implementation and look only at header files, or just the class interfaces. I try to get an idea of what the purpose of each class is. Second, I go one level deep into the implementation starting with what seems to be the area of most importance. This is hard to gauge, so occasionally I just start at the top and work my way down in the file list. I call this breadth-first learning. After this initial step, I generally go depth-wise through the rest of the code. The initial breadth-first look helps to solidify/fix any ideas I got from the interface level, and then the depth-wise look shows me the patterns that have been used to implement the system, as well as the different design ideas. By depth-first, I mean you basically step through the program using the debugger, stepping into each function to see how it works, and so on. This obviously isn't possible with really large systems, but 20k LOC is not that many. :)
Work with another programmer who is more familiar with the system to develop a new feature or to fix a bug. This is the method that I've seen work out the best.
I think you need to tie this to a particular task. When you have time on your hands, go for whichever approach you are in the mood for.
When you have something that needs to get done, give yourself a narrow focus and get it done.
Get the team to put you on bug fixing for two weeks (if you have two weeks). They'll be happy to get someone to take responsibility for that, and by the end of the period you will have spent so much time problem-solving with the library that you'll probably know it pretty well.
If it has unit tests (I'm betting it doesn't). Start small and make sure the unit tests don't fail. If you stare at the entire codebase at once your eyes will glaze over and you will feel overwhelmed.
If there are no unit tests, you need to focus on the feature that you want. Run the app and look at the results of things that your feature should affect. Then start looking through the code trying to figure out how the app creates the things you want to change. Finally change it and check that the results come out the way you want.
You mentioned it is an app and a library. First change the app and stick to using the library as a user. Then after you learn the library it will be easier to change.
From a top down approach, the app probably has a main loop or a main gui that controls all the action. It is worth understanding the main control flow of the application. It is worth reading the code to give yourself a broad overview of the main flow of the app. If it is a GUI app, creating a paper that shows which screens there are and how to get from one screen to another. If it is a command line app, how the processing is done.
Even in companies it is not unusual to have this approach. Often no one fully understands how an application works. And people don't have time to show you around. They prefer specific questions about specific things so you have to dig in and experiment on your own. Then once you get your specific question you can try to isolate the source of knowledge for that piece of the application and ask it.
Start by understanding the 'problem domain' (is it a payroll system? inventory? real time control or whatever). If you don't understand the jargon the users use, you'll never understand the code.
Then look at the object model; there might already be a diagram or you might have to reverse engineer one (either manually or using a tool as suggested by Doug). At this stage you could also investigate the database (if any), if should follow the object model but it may not, and that's important to know.
Have a look at the change history or bug database, if there's an area that comes up a lot, look into that bit first. This doesn't mean that it's badly written, but that it's the bit everyone uses.
Lastly, keep some notes (I prefer a wiki).
The existing guys can use it to sanity check your assumptions and help you out.
You will need to refer back to it later.
The next new guy on the team will really thank you.
I had a similar situation. I'd say you go like this:
If its a database driven application, start from the database and try to make sense of each table, its fields and then its relation to the other tables.
Once fine with the underlying store, move up to the ORM layer. Those table must have some kind of representation in code.
Once done with that then move on to how and where from these objects are coming from. Interface? what interface? Any validations? What preprocessing takes place on them before they go to the datastore?
This would familiarize you better with the system. Remember that trying to write or understand unit tests is only possible when you know very well what is being tested and why it needs to be tested in only that way.
And in case of a large application that is not driven towards databases, I'd recommend an other approach:
What the main goal of the system?
What are the major components of the system then to solve this problem?
What interactions each of the component has among them? Make a graph that depicts component dependencies. Ask someone already working on it. These componentns must be exchanging something among each other so try to figure out those as well (like IO might be returning File object back to GUI and like)
Once comfortable to this, dive into component that is least dependent among others. Now study how that component is further divided into classes and how they interact wtih each other. This way you've got a hang of a single component in total
Move to the next least dependent component
To the very end, move to the core component that typically would have dependencies on many of the other components which you've already tackled
While looking at the core component, you might be referring back to the components you examined earlier, so dont worry keep working hard!
For the first strategy:
Take the example of this stackoverflow site for instance. Examine the datastore, what is being stored, how being stored, what representations those items have in the code, how an where those are presented on the UI. Where from do they come and what processing takes place on them once they're going back to the datastore.
For the second one
Take the example of a word processor for example. What components are there? IO, UI, Page and like. How these are interacting with each other? Move along as you learn further.
Be relaxed. Written code is someone's mindset, froze logic and thinking style and it would take time to read that mind.
First, if you have team members available who have experience with the code you should arrange for them to do an overview of the code with you. Each team member should provide you with information on their area of expertise. It is usually valuable to get multiple people explaining things, because some will be better at explaining than others and some will have a better understanding than others.
Then, you need to start reading the code for a while without any pressure (a couple of days or a week if your boss will provide that). It often helps to compile/build the project yourself and be able to run the project in debug mode so you can step through the code. Then, start getting your feet wet, fixing small bugs and making small enhancements. You will hopefully soon be ready for a medium-sized project, and later, a big project. Continue to lean on your team-mates as you go - often you can find one in particular who is willing to mentor you.
Don't be too hard on yourself if you struggle - that's normal. It can take a long time, maybe years, to understand a large code base. Actually, it's often the case that even after years there are still some parts of the code that are still a bit scary and opaque. When you get downtime between projects you can dig in to those areas and you'll often find that after a few tries you can figure even those parts out.
Good luck!
You may want to consider looking at source code reverse engineering tools. There are two tools that I know of:
SWAG Kit (Linux only) link
Bauhaus academic commercial
Both tools offer similar feature sets that include static analysis that produces graphs of the relations between modules in the software.
This mostly consists of call graphs and type/class decencies. Viewing this information should give you a good picture of how the parts of the code relate to one another. Using this information, you can dig into the actual source for the parts that you are most interested in and that you need to understand/modify first.
I find that just jumping in to code can be a a bit overwhelming. Try to read as much documentation on the design as possible. This will hopefully explain the purpose and structure of each component. Its best if an existing developer can take you through it but that isn't always possible.
Once you are comfortable with the high level structure of the code, try to fix a bug or two. this will help you get to grips with the actual code.
I like all the answers that say you should use a tool like Doxygen to get a class diagram, and first try to understand the big picture. I totally agree with this.
That said, this largely depends on how well factored the code is to begin with. If its a gigantic mess, it's going to be hard to learn. If its clean, and organized properly, it shouldn't be that bad.
See this answer on how to use test coverage tools to locate the code for a feature of interest, without knowing anything about where that feature is, or how it is spread across many modules.
(shameless marketing ahead)
You should check out nWire. It is an Eclipse plugin for navigating and visualizing large codebases. Many of our customers use it to break-in new developers by printing out visualizations of the major flows.

The best way to familiarize yourself with an inherited codebase

Stacker Nobody asked about the most shocking thing new programmers find as they enter the field.
Very high on the list, is the impact of inheriting a codebase with which one must rapidly become acquainted. It can be quite a shock to suddenly find yourself charged with maintaining N lines of code that has been clobbered together for who knows how long, and to have a short time in which to start contributing to it.
How do you efficiently absorb all this new data? What eases this transition? Is the only real solution to have already contributed to enough open-source projects that the shock wears off?
This also applies to veteran programmers. What techniques do you use to ease the transition into a new codebase?
I added the Community-Building tag to this because I'd also like to hear some war-stories about these transitions. Feel free to share how you handled a particularly stressful learning curve.
Pencil & Notebook ( don't get distracted trying to create a unrequested solution)
Make notes as you go and take an hour every monday to read thru and arrange the notes from previous weeks
with large codebases first impressions can be deceiving and issues tend to rearrange themselves rapidly while you are familiarizing yourself.
Remember the issues from your last work environment aren't necessarily valid or germane in your new environment. Beware of preconceived notions.
The notes/observations you make will help you learn quickly what questions to ask and of whom.
Hopefully you've been gathering the names of all the official (and unofficial) stakeholders.
One of the best ways to familiarize yourself with inherited code is to get your hands dirty. Start with fixing a few simple bugs and work your way into more complex ones. That will warm you up to the code better than trying to systematically review the code.
If there's a requirements or functional specification document (which is hopefully up-to-date), you must read it.
If there's a high-level or detailed design document (which is hopefully up-to-date), you probably should read it.
Another good way is to arrange a "transfer of information" session with the people who are familiar with the code, where they provide a presentation of the high level design and also do a walk-through of important/tricky parts of the code.
Write unit tests. You'll find the warts quicker, and you'll be more confident when the time comes to change the code.
Try to understand the business logic behind the code. Once you know why the code was written in the first place and what it is supposed to do, you can start reading through it, or as someone said, prolly fixing a few bugs here and there
My steps would be:
1.) Setup a source insight( or any good source code browser you use) workspace/project with all the source, header files, in the code base. Browsly at a higher level from the top most function(main) to lowermost function. During this code browsing, keep making notes on a paper/or a word document tracing the flow of the function calls. Do not get into function implementation nitti-gritties in this step, keep that for a later iterations. In this step keep track of what arguments are passed on to functions, return values, how the arguments that are passed to functions are initialized how the value of those arguments set modified, how the return values are used ?
2.) After one iteration of step 1.) after which you have some level of code and data structures used in the code base, setup a MSVC (or any other relevant compiler project according to the programming language of the code base), compile the code, execute with a valid test case, and single step through the code again from main till the last level of function. In between the function calls keep moting the values of variables passed, returned, various code paths taken, various code paths avoided, etc.
3.) Keep repeating 1.) and 2.) in iteratively till you are comfortable up to a point that you can change some code/add some code/find a bug in exisitng code/fix the bug!
-AD
I don't know about this being "the best way", but something I did at a recent job was to write a code spider/parser (in Ruby) that went through and built a call tree (and a reverse call tree) which I could later query. This was slightly non-trivial because we had PHP which called Perl which called SQL functions/procedures. Any other code-crawling tools would help in a similar fashion (i.e. javadoc, rdoc, perldoc, Doxygen etc.).
Reading any unit tests or specs can be quite enlightening.
Documenting things helps (either for yourself, or for other teammates, current and future). Read any existing documentation.
Of course, don't underestimate the power of simply asking a fellow teammate (or your boss!) questions. Early on, I asked as often as necessary "do we have a function/script/foo that does X?"
Go over the core libraries and read the function declarations. If it's C/C++, this means only the headers. Document whatever you don't understand.
The last time I did this, one of the comments I inserted was "This class is never used".
Do try to understand the code by fixing bugs in it. Do correct or maintain documentation. Don't modify comments in the code itself, that risks introducing new bugs.
In our line of work, generally speaking we do no changes to production code without good reason. This includes cosmetic changes; even these can introduce bugs.
No matter how disgusting a section of code seems, don't be tempted to rewrite it unless you have a bugfix or other change to do. If you spot a bug (or possible bug) when reading the code trying to learn it, record the bug for later triage, but don't attempt to fix it.
Another Procedure...
After reading Andy Hunt's "Pragmatic Thinking and Learning - Refactor Your Wetware" (which doesn't address this directly), I picked up a few tips that may be worth mentioning:
Observe Behavior:
If there's a UI, all the better. Use the app and get a mental map of relationships (e.g. links, modals, etc). Look at HTTP request if it helps, but don't put too much emphasis on it -- you just want a light, friendly acquaintance with app.
Acknowledge the Folder Structure:
Once again, this is light. Just see what belongs where, and hope that the structure is semantic enough -- you can always get some top-level information from here.
Analyze Call-Stacks, Top-Down:
Go through and list on paper or some other medium, but try not to type it -- this gets different parts of your brain engaged (build it out of Legos if you have to) -- function-calls, Objects, and variables that are closest to top-level first. Look at constants and modules, make sure you don't dive into fine-grained features if you can help it.
MindMap It!:
Maybe the most important step. Create a very rough draft mapping of your current understanding of the code. Make sure you run through the mindmap quickly. This allows an even spread of different parts of your brain to (mostly R-Mode) to have a say in the map.
Create clouds, boxes, etc. Wherever you initially think they should go on the paper. Feel free to denote boxes with syntactic symbols (e.g. 'F'-Function, 'f'-closure, 'C'-Constant, 'V'-Global Var, 'v'-low-level var, etc). Use arrows: Incoming array for arguments, Outgoing for returns, or what comes more naturally to you.
Start drawing connections to denote relationships. Its ok if it looks messy - this is a first draft.
Make a quick rough revision. Its its too hard to read, do another quick organization of it, but don't do more than one revision.
Open the Debugger:
Validate or invalidate any notions you had after the mapping. Track variables, arguments, returns, etc.
Track HTTP requests etc to get an idea of where the data is coming from. Look at the headers themselves but don't dive into the details of the request body.
MindMap Again!:
Now you should have a decent idea of most of the top-level functionality.
Create a new MindMap that has anything you missed in the first one. You can take more time with this one and even add some relatively small details -- but don't be afraid of what previous notions they may conflict with.
Compare this map with your last one and eliminate any question you had before, jot down new questions, and jot down conflicting perspectives.
Revise this map if its too hazy. Revise as much as you want, but keep revisions to a minimum.
Pretend Its Not Code:
If you can put it into mechanical terms, do so. The most important part of this is to come up with a metaphor for the app's behavior and/or smaller parts of the code. Think of ridiculous things, seriously. If it was an animal, a monster, a star, a robot. What kind would it be. If it was in Star Trek, what would they use it for. Think of many things to weigh it against.
Synthesis over Analysis:
Now you want to see not 'what' but 'how'. Any low-level parts that through you for a loop could be taken out and put into a sterile environment (you control its inputs). What sort of outputs are you getting. Is the system more complex than you originally thought? Simpler? Does it need improvements?
Contribute Something, Dude!:
Write a test, fix a bug, comment it, abstract it. You should have enough ability to start making minor contributions and FAILING IS OK :)! Note on any changes you made in commits, chat, email. If you did something dastardly, you guys can catch it before it goes to production -- if something is wrong, its a great way to get a teammate to clear things up for you. Usually listening to a teammate talk will clear a lot up that made your MindMaps clash.
In a nutshell, the most important thing to do is use a top-down fashion of getting as many different parts of your brain engaged as possible. It may even help to close your laptop and face your seat out the window if possible. Studies have shown that enforcing a deadline creates a "Pressure Hangover" for ~2.5 days after the deadline, which is why deadlines are often best to have on a Friday. So, BE RELAXED, THERE'S NO TIMECRUNCH, AND NOW PROVIDE YOURSELF WITH AN ENVIRONMENT THAT'S SAFE TO FAIL IN. Most of this can be fairly rushed through until you get down to details. Make sure that you don't bypass understanding of high-level topics.
Hope this helps you as well :)
All really good answers here. Just wanted to add few more things:
One can pair architectural understanding with flash cards and re-visiting those can solidify understanding. I find questions such as "Which part of code does X functionality ?", where X could be a useful functionality in your code base.
I also like to open a buffer in emacs and start re-writing some parts of the code base that I want to familiarize myself with and add my own comments etc.
One thing vi and emacs users can do is use tags. Tags are contained in a file ( usually called TAGS ). You generate one or more tags files by a command ( etags for emacs vtags for vi ). Then we you edit source code and you see a confusing function or variable you load the tags file and it will take you to where the function is declared ( not perfect by good enough ). I've actually written some macros that let you navigate source using Alt-cursor,
sort of like popd and pushd in many flavors of UNIX.
BubbaT
The first thing I do before going down into code is to use the application (as several different users, if necessary) to understand all the functionalities and see how they connect (how information flows inside the application).
After that I examine the framework in which the application was built, so that I can make a direct relationship between all the interfaces I have just seen with some View or UI code.
Then I look at the database and any database commands handling layer (if applicable), to understand how that information (which users manipulate) is stored and how it goes to and comes from the application
Finally, after learning where data comes from and how it is displayed I look at the business logic layer to see how data gets transformed.
I believe every application architecture can de divided like this and knowning the overall function (a who is who in your application) might be beneficial before really debugging it or adding new stuff - that is, if you have enough time to do so.
And yes, it also helps a lot to talk with someone who developed the current version of the software. However, if he/she is going to leave the company soon, keep a note on his/her wish list (what they wanted to do for the project but were unable to because of budget contraints).
create documentation for each thing you figured out from the codebase.
find out how it works by exprimentation - changing a few lines here and there and see what happens.
use geany as it speeds up the searching of commonly used variables and functions in the program and adds it to autocomplete.
find out if you can contact the orignal developers of the code base, through facebook or through googling for them.
find out the original purpose of the code and see if the code still fits that purpose or should be rewritten from scratch, in fulfillment of the intended purpose.
find out what frameworks did the code use, what editors did they use to produce the code.
the easiest way to deduce how a code works is by actually replicating how a certain part would have been done by you and rechecking the code if there is such a part.
it's reverse engineering - figuring out something by just trying to reengineer the solution.
most computer programmers have experience in coding, and there are certain patterns that you could look up if that's present in the code.
there are two types of code, object oriented and structurally oriented.
if you know how to do both, you're good to go, but if you aren't familiar with one or the other, you'd have to relearn how to program in that fashion to understand why it was coded that way.
in objected oriented code, you can easily create diagrams documenting the behaviors and methods of each object class.
if it's structurally oriented, meaning by function, create a functions list documenting what each function does and where it appears in the code..
i haven't done either of the above myself, as i'm a web developer it is relatively easy to figure out starting from index.php to the rest of the other pages how something works.
goodluck.