How do I find methods? - language-agnostic

Here's a somewhat general computer question. I've always been able to follow the LOGIC of programming, but when I go to code something, I always find that I don't know some method or another to get what I need to get done. When I see it, I always think, "OF COURSE!".
How do you go about finding relevant methods for your programming needs that are "built-in?" I don't enjoy re-inventing the wheel, but I find it difficult to find what I need to do what I want to do.

First try Google:
You can use google to search your required method. For example If I want to search a value in array in PHP then I go to Google and type "Search values in array in PHP". I find my required function at first place.
Then try Standard Documentation:
Try standard documentation to search for your required method. For example if my problem is related to strings in PHP then I go to String Functions documentation and find the required function.
Finally try Stackoverflow:
Otherwise you can ask your problem at Stackoverflow for your required methods and libraries. You will always get a shortest way.

What you are asking here is for the best way to do research. Well, that's hard skill to explain, even more so to teach.
Nevertheless here are some tips:
Go to a search engine. It makes no
sense to start in a place like MSDN,
since all of its content is indexed
by the search engines anyway.
Phrase your question several
different ways.
As you learn more
about the issue you will learn new
vocabulary about it. Use that new
vocabulary to do even more searches.
If the searches turn out empty,
switch to browsing a specific
section of the official
documentation that you think is the
most related to what you are doing. If nothing else, it will expand your horizons around the issue and give you more vocabulary to do more searches.
Finally, if all else fails ask a question on StackOverflow explaining what you want to do as clearly as possible.
Note that if there's a simple API that does what you need, you will rarely reach step 4.

You say:
It's very frustrating to suddenly find
an "easy" button mid-way through.
Try to see it differently. Think of these moments as blessings. You've just learned something. You invested a lot of effort - and instead of seeing that effort as wasted, see it as critical to proper learning. You - better than the guy who just happened across the magic method - really understand what it's for and something about how it works. And you really, really, understand why you need it, and you properly appreciate its value. You're never going to forget that method.
So it was costly, but you learned something important. Celebrate, and move on.

It is usually included in some form of documentation. Most IDEs support the documentation format and gives you auto-complete functionality.

if you are using MVS so MSDN is really good for it

In addition to this and this answer above, google's basic and advanced searching tips prove very helpful.
In addition to above, changing the order of keywords in search criteria also sorts the list in different orders.
In essence I believe that searching is still an art rather than a science, and is best learnt - quoting from David Reis' answer above: "2. As you learn more about the issue you will learn new vocabulary about it. Use that new vocabulary to do even more searches."

Search in the API documentation. But the best way to (I found so) is to search on the internet for multiple solutions and then choose the one that you think is best. Make your search as narrow as possible. For example you want to implement random number generation function, then search like this, "How to generate random numbers in Java?".

Namespaces, namingconventions, Autocomplete/Intellisence
I assume that you are trying to find some kind of Object-Oriented-apis . I use .net in my example.
First try to find a class that might be responsable for the method you are looking for.
Example: If you want to "Make a new Directory in the Filesystem" you must know (or learn) that (in dotnet) these classes are in the namespace System.IO:
This namespace contains subnamespaces like Compresseion and Classes like File, Path, Directory, ...
Second you sould know NamingConventions. There are common Naming-Prefixes for methods like Get, Set, Insert, Create. In the documentation for class Directory you will find a CreateDirectory-Method.
If you have an intelligent editor that knows your programming language and the classes and namespaces learning is much easier. In the dotnet-world this feature is called Autocomplete/Intellisence

Related

First write code using API, then actual API - does this approach have a name and is valid for API design process?

Standard way of working on new API (library, class, whatever) usually looks like this:
you think about what methods would API user need
you implement API that you suspect user will need
So basically you trying to guess what your API should look like. It very often leads to over engineering stuff, huge APIs that you think user will need and it is very possible that great part of your code won't be used at all.
Some time ago, maybe few years even, I read some article that promoted writing client code first. I don't remember where I found it but author pointed out several advantages like better understanding how API will be used, what it should provide and what is basically obsolete. I think idea was that it goes along with SCRUM methodology and user stories but on implementation level.
Just out of curiosity for my latest private project I started not with actual API (some kind of toolkit library) but with client code that would use this API. Of course my code is all in red because classes, methods and properties does not exist and I can forget about help from intellisense but what I noticed is that after few days of coding my application "has" all basic functionalities and my library API "is" a lot smaller than I imagined when starting a project.
I don't say that if somebody took my library and started using it it wouldn't lack some features but I think it helped me to realize that my idea of this API was somewhat flawed because I usually try to cover all bases and provide methods "just in case". And sometimes it bites me badly because I made some stupid mistake in basic functions being more focused on code that somebody maybe would need.
So what I would like to ask you do you ever tried this approach when needed to create a new API and did it helped you? Is it some recognized technique that has a name?
So basically you're trying to guess what your API should look like.
And that's the biggest problem with designing anything this way: there should be no (well, minimal) guesswork in software design. Designing an API based on assumptions rather than actual information is dangerous, for several reasons:
It's directly counter to the principle of YAGNI: in order to get anything done, you have to assume what the user is going to need, with no information to back up those assumptions.
When you're done, and you finally get around to using your API, you'll invariably find that it sucks to use (poor user experience), because you weren't thinking about how the library is used (UX), you were thinking about what the library must do (features).
An API, by definition, is an interface for users (i.e., developers). Designing as anything else just makes for a bad design, without fail.
Writing sample code is like designing a GUI before writing the backend: a Good Thing. It forces you to think about user experience and practical effects of design decisions without getting bogged down in useless theorising and assumption.
And contrary to Gabriel's answer, this is not bottom-up design: it's top-down. Rather than design the concrete backend of your library and then force an abstract interface on top of it, you first design the interface and then worry about the implementation.
Generally speaking, the idea of designing the concrete first and abstracting from that afterwards is called bottom-up design. Test Driven Development uses similar principle to what you describe to support better design. Firstly you write a test, which is an use of code you are going to write afterwards. It is important to proceed stepwise, because you have to proove the API is implementable. IMportant part of each part is refactoring - this allows you design more concise API and reuse parts of your code.

Commenting practices?

As a student in computer engineering I have been pressured to type up very detailed comments for everything I do. I can see this being very useful for group projects or in the work place but when you work on your own projects do you spend as much time commenting?
As a personal project I am working on grows more and more complicated I sometimes feel as though I should be commenting more but I also feel as though it's a waste of time since I will probably be the only one working on it. Is it worth the time and cluttered code?
Thoughts?
EDIT: This has given me a lot to think about. Thanks for all your input! I never expected this large of a response.
Well considered comments illuminate when the code cannot. Well considered function and variable names eliminate the need for copious comments. If you find it necessary to comment everything, consider simplifying your code.
If you ever look at code you wrote 6 months before, you will be wondering why you did not comment better.
If the code is well written, with short methods (see Composed Method pattern) and meaningful names, then the code needs very little comments. Only comments which explain the "why" are useful - the comments explaining "what" should be replaced by improving the code so much that it's obvious what it does. Comments should not be used as an excuse for writing bad code.
Public APIs, especially in closed-source apps, are perhaps the only place where thorough javadocs are recommended - and then you need to take the effort to maintain them and keep them always accurate and up-to-date. A misleading or outdated comment is worse than no comment. It's better to document what the code does by writing tests for it and using good names for the tests.
The book Clean Code has a good chapter about comments.
Unit tests and the like are the best forms of code documentation. Some testing frameworks write out a spec of what the class under test should do, giving people a great introduction to how a piece of code works in pure english while also providing very clean way to implement the tests itself.
Examples of that are Scala's ScalaTest or RSpec for Ruby.
I find that unless some weird hacky thing is required by the code in question, it is usually not beneficial to comment it. Also, it adds a lot of overhead because you have to maintain the comments... and maintaining the code and tests is already enough work.
Remember, code with out-of-date comments is worse than no comments at all!
A lot of the time, comments just says what the code does anyway, which is a waste of human effort. And if it doesn't, your code probably sucks and you should refactor it.
Just use testing frameworks.
Comment -- or better yet, recode -- anything that is non-obvious now. Later it will be completely non-obvious. You might think, "but it's going to be me," but if your way of thinking (and ways of coding) changes as you grow what's obvious to you now might not be obvious to you later.
Have you read Code Complete yet? Recommended as a very good read, and a great way to figure out some of the things CS profs drill down your throat.
Code comments come in two variety:
Comments to explain logic, making
sure that the code matches the
intent. Often people will write high
level pseudocode and will use that
in comment form to fill in the
actual code of what the module will
do. Then they leave the comments as
a read-along which can be used
during later review.
Comments to
explain usage of a module. Think
javadocs. Your intent here is for
the consumers to understand why your
code is important. One use of
javadocs is in the Visual Studio
Intellisense (since I don't use
Eclipse idk). It shows the comments
from the javadoc in the intellisense
hover. Becomes VERY handy later on.
When professors ask you to document everything in your code, I have found the usage of psuedocode translated to actual code to be sufficient. However, in practice I've not found that many devs need it, as usually the code is sufficient to explain itself (when simple, well written, not relying on tricks, and when using descriptive variable names).
But I still put in comments about intent if I think it's the least bit unclear. This is just a bit of best practice.
But I definitely say read that book. ;)
If you ever decide to open-source your personal project, people will thank you for your comments (unless they're terrible). If you hit upon a spectacularly great idea and your personal project turns into a business, then you'll be hiring more developers, and again your comments will be valuable. If you suffer a mild head injury, then when you return to work you'll be thankful for your comments.
Some people treat comments as a code smell, a sign that the code could use more descriptive names and a better structure. They will fix the code so it does not need comments.
This works in a lot of cases. However one type of comment that is useful is 'why' something is being done. Sometimes fixes are made for obscure reasons that would not be obvious when reviewing the code later. The comments should not express what the code does (that should be covered by naming) or how it does that (again, the code tells you that), so save your comments for 'why'.
I find that nothing serves as better documentation as to how something works then unit tests.
Whenever i'm doing something that isn't self-documenting, i'll put a comment. I will forget what i was doing unless i do. But i prefer to write code that's so obvious that comments don't help much. Wherever possible, the code should be clear enough that thousands of lines of comments would be unnecessary.
Whatever you do, do NOT write comments like this...
// add 1 to i
++i;
That's noise. You're actually worse off with comments like that than with none at all.
A hard-core stance is: "if you have to write a comment for your code, your code is broken". Rather than writing explanatory comments, refactor your code so that the comments become less necessary. This applies especially to function names (including their parameters), since they tend to be modified the most, and the comments seldom are updated to match.
Instead of:
// Compute average for the two times
int a = t1 + (t2 - t1) / 2;
write
int averageTime = AverageOfTimes(t1, t2);
int AverageOfTimes(int t1, int t2) {
return t1 + (t2-t1);
}
Stale comments are one of the leading causes of WTF's when I'm reading other people's code.
Overcommenting has been cited as a "code smell" by several authors, including the authors of "Clean Code".
Personally, I write an explanatory comment for each class (I code in C# and C++ mostly), and occasionally when I am using an algorithm I want to refer to.
To be honest, if code is clear its not necessary, but comments are best when a specific logic breaks given certain data (which may not be obvious). Leaving comments of issues that may occur is a great way to help prevent accidental bugs due to misunderstandings of what data to expect (or specifically not).
I used to be in the exact same situation as you. When I first started I never commented anything because everything was extremely small and I always knew how it worked. But as I continued to expand my code and everything started pointing to each other, I found myself not knowing what certain things did anymore and got lost. I had to rewrite a lot of things so I knew what they did again, and I started commenting everything so I knew exactly how it worked and how to use it in the future. You may think you know how everything works now, but in the future you'll look back at something and say 'HUH?' It's better to comment things now and save yourself the trouble later.
The way I comment things:
Always add this at the top of any function, so you know what it does.
/**
* What the function is supposed to do, a brief description of it.
*
* #param paramter_name Object-type A description of it, do for each parameter.
*
* #return Object-type - A brief description of what is being returned.
**/
Then throughout your code make sure you comment things that look complicated. When you run checks, put a quick comment like 'make sure this is valid'. For any long lines of code or large blocks of code, add a comment of what that specific section does, to easily find it later on.
Commenting is useful for individual project as well as group projects especially when you will need to maintain or enhance the code over an extended period of time. This scenario may not be applicable for school projects, but in the workplace it is definitely applicable. If you have ever had to look at code that you wrote 6 months in the past then it might as well have been written by somebody else.
I have found that--as everyone will tell you--if you are coming back to code after several months you will have forgotten everything. That said, I hardly comment my own code except for comments like // ew.. hacked because X-beyond-my-control doesn't do Y correctly. But this is because the code itself is written very cleanly and is very easy to read. For instance, all variable and function names are completely descriptive. I don't use abbreviations except for rather long words which are obvious such as 'info'. All functions are extremely brief. All functions do one thing if possible. Nesting is avoided if possible. Logically related functions are grouped via classes or modules.
When I read other people's code, I don't read the comments. I read the code. And the difference between clear and well written code and spaghetti is far more important than any comments. Usually I don't care what their intent is/was; I want to change the code. And to do that it easily it needs to be well organized. So the comments are peripheral. But perhaps this is just me.
Technically speaking the code is perfectly self documenting. I like to comment anything that is non-obvious or especially complex. I tend to like to have at minimum summary on a class, and maybe a short blurb for each member. When you have to write a huge amount of doc on a very large and complex method that is usually a good sign that it needs to be looked at for a refactor.
foreach(var a in b.Items) //Enumerate the items in "b"
{
if(a.FirstName == "Jones") //See if first name is Jones
....
You want something in the middle of the above and no commenting whatsoever.
commenting is always useful!And I dont thik that commenting is a waste of time!It helps other developeres understand your code and it helps you when you haven't work on a project for months.
University software courses often push people towards excessive commenting as a way of forcing students to think twice about what they have typed. A nasty rule-of-thumb that was put my way when I was marking undergrad coursework a few years ago suggested a comment every 5-10 lines. Think most Java courses tell you to limit methods to circa 30 lines, so lots of comments within a function is a sign you should break it up.
My personal preference is to limit comments to documenting function purpose/inputs/outputs and data structures. Otherwise comments ought to be limited to things that require particular attention (eg things that looks out of place, perhaps bugs, at first glance).
The first comments are the variables and methods names. A lot of attention shall be paid to choose them. Do not hesitate to rename them if you can think of a better name.
Consistency and convention also help. Try to avoid NumberOfStuff, ThingCount, FooSize, always use the same form, possibly something in line with the language (does it have Array.length, or Vector.size). Always name similar things with similar names.
"Code never lies. Comments sometimes do". One day or the other, someone will modify the code and not the associated comment. Better spenf more time writing self-explanatory code complemented with some clever comment, than splitting out code and then explaining things with a lot of comments.
Rule #1
Comments should indicate WHY not 'what' (the code already tells you 'what' is happening)
Rule #2
After writing the comment - rewrite your code to read like the comment (then remove the comment)
While good variable and function names etc. may lessen the need for comments, any sufficiently complex program is going to be hard to understand merely by looking at the code alone, no matter how carefully it was written.
In general, the larger the body of code and the longer you expect it to be in use, the more important it will be to write detailed comments. It is perfectly reasonable for teachers to demand detailed comments because they don't want to have to spend a lot of time trying to understand large amounts of code from different students, but you don't necessarily have to maintain that same standard outside the classroom.
Regarding commenting functions & classes and the like, I find that only the most trivial functions are so self explanatory that a comment won't save the reader some time. Also, when you are using a smart IDE for a language such as Java or Curl writing properly formatted documentation comments will allow developers to see documentation for functions and methods simply by hovering over a call signature. This can save you a lot of time when working with large systems.

What are some authoritative/respected "best known implementation" websites/resources?

EDIT:
Wow, the initial response to this question was quite negative. I think I might have triggered some pretty strong emotions by using the word "best"; it seems like a few people latched onto that word and decided to dismiss my question right away.
Obviously, there are many, many situations in which no single approach is "best", or at least, what ends up being the best solution to one problem will often not be the best solution for other, even similar, problems. I get that. But now let me try to elaborate on the reasoning behind what I'm actually asking.
I tend to find it easiest to explain myself using analogies, so here goes. In my current job I work almost exclusively in .NET. .NET has a lot of functionality built into the framework. A prime example is the System.Collections.Generic namespace, which has a bunch of collection classes that (almost) no .NET developer in his/her right mind would bother re-developing from scratch, because very good implementations are already there. If I am working on a problem that requires a doubly linked list, I'm not going to decide, "Okay, time to write a doubly linked list class"; I'm just going to use the LinkedList<T> that's already there, or, at most, extend it or wrap it with my own class that adds some extra functionality.
Am I saying the "best" version of a doubly linked list is LinkedList<T> from .NET? Of course not. That would be absurd. But I highly doubt .NET's implementation of LinkedList<T> is drastically different from most other established libraries' implementations of collections that are intended to serve the same purpose (that of a doubly linked list). On the other hand, I am relatively confident that if I were to write my own implementation from scratch, there'd be a considerable number of issues with it, in terms of robustness, performance, flexibility, etc. for one simple reason: not that I'm stupid, or lazy, or don't care about good code--simply that I'm one person, and I'm not an expert on linked lists, and I haven't thought of everything that needs to be taken into consideration when designing one.
But I happen to be a developer who does take an interest in how things are implemented internally. And so it would be nice if I could check out a page where some variant of a well thought-out design for a linked list--or for any fairly established concept for which robust, efficient implementations have been written--were available to view. (By the way, yes I am aware that the source code for .NET's LinkedList<T> is available. I'm just using that as an example; really I am talking about all problems with solutions for which good, working implementations exist.)
Now, I talked about this being something that is open; let me elaborate on that. I am not talking about sites like SourceForge.net, or CodePlex, or Google Code. These are all sites for hosting projects, i.e., applications or libraries tailored for some specific industry or field or otherwise categorizable purpose. What I'm talking about is something like this:
http://en.wikibooks.org/wiki/Category:Algorithms_and_data_structures
Maybe I should have just provided that link in the first place, as it probably illustrates what I'm getting at better than anything I've written so far. But I think the main point that differentiates what I'm asking about from any other site I've seen is that I was specifically wondering if there could be some way to work on a new problem--so, something for which there aren't necessarily any well-known, established implementations, again as in my linked list example--collaboratively, in a wiki-esque fashion, but not tied to any specific open-source project.
So, as a conclusion of sorts, I was kind of envisioning a situation like the following: I find myself faced with a new problem. Maybe it isn't common enough to be something that is addressed in a framework like .NET. But it's common enough that some developers here and there are independently working on it. If a website exists like what I'm imagining, maybe at some point one of those developers working on the problem could post an idea on that website, and over time others might discover it and suggest improvements/modifications, and given enough time and participation, a pretty darn good implementation might result from all this collaboration. And from there, eventually, something like this implementation might be considered fairly "standard", just like a linked list implementation, or a quicksort implementation, or, I don't know, some well-known pseudo-random number generator.
Does this make any more sense to anyone now? I feel quite confident that what I'm talking about is not absurd, but hey, if that's what people think, then maybe it is.
Open source projects are very popular. Some of these are libraries suited for specific purposes, the best of which include some very well-written code.
However, if you're interested in contributing to an open source project, finding a project that is well-suited to your skills can be quite a task. At the same time, if you're interesting in using an open source project in your own work, finding a project that is well-suited to your needs can also be difficult, especially when, for example, open-source library X has a lot of functionality you could use, as does library Y, and these two libraries' capabilities overlap so that integrating both into your code could be messy.
We've all seen questions, here on Stack Overflow and elsewhere on the web, posted by one developer: "How would I implement this idea?" and answered by others, often accompanied by a plethora of example code. Sometimes these answers link to an open source project/library that provides functionality similar to what the poster is asking about.
My question is: are there any well-known websites or other sources that are open in nature and provide "best-known implementations" for common (or even not-so-common) programming problems, but not associated with any particular open source project?
As a generic example, suppose I have a need for some algorithm that does X. I post a question on SO or some other site requesting ideas, asking for suggestions on how best to implement it. One person points me to project P1, which contains some code that performs something very similar to this algorithm. Another person points me to project P2. Someone else writes some sample code and says, "maybe you could do it like this."
It seems to me, if there are all these different versions of this idea floating around out in the world, it would make sense for there to be a site, somewhat in the vein of Wikipedia, where a quasi-"official" implementation ("official" is not the right word; I'm just having trouble thinking of a better one right now) could be published and modified as improvements are developed/discovered.
I feel like I have stumbled across a few different sites like this in the past, but I'm interested to know if anyone else has found any resources like what I'm describing.
The very idea is absurd. It means that there's one, single opinion on "best-known implementations" with no changes based on other people having better ideas.
It implies a that best practices are static and can be accumulated into a single repository.
If they could be collected, then Google would have them and would simply charge for access.
Interestingly, they don't have all the best practices. Interestingly, they have to expend mountains of computing power looking for more information. Then people (like you) have to read and think and judge and decide.
The read-think-judge-decide is really hard to eliminate. Unless, of course, you want someone to think for you. In which case, there are many companies who have a single solution that requires less thinking. Call Microsoft or Oracle or IBM. They have solutions that are all in one place, unified best practices, no reading, no thinking, no judging, no deciding required.
Open -- by definition -- means it's impossible to have a single authoritative source.
Here is something, maybe not the best implementations. But a book called Design Patterns contains what is considered by many programmers some of the best patterns to follow!

Referencing other peoples code

I'm currently working on my Bachelors dissertation. This involves developing a software product and a 12000 word write-up, mainly covering research, design and development. Now where I am quoting other peoples written work, I am obviously referencing this, but what about code? There have been plenty of times where I've looked for a solution to a problem I was unsure of and found someone who had solved the problem. Most of the time I took their code, worked to understand what they were doing, and then wrote my own version in my application, so should it be referenced somehow?
What would you guys do, put a comment in the code referencing the original author, add a reference in the write up or my bibliography, or nothing at all?
Where its a significant or interesting piece of code used, I will probably refer to it in my write-up, but for solutions that don't warrant this, I’m trying to come up with a good solution.
If you were the author of some code I had either used, or been inspired by, what would make you happy that I wasn't plagiarizing you?
To take this a bit further, there's really 2 different things here. If I go to MSDN to lookup how to use a particular part of the .net framework, is that something that should be referenced, or is it fair use of the framework.
Where as if I've used an algorithm that someone clearly developed and put a lot of time into, that's something I would definitely reference.
It all depends on context. Many algorithms are so well known that they are generally considered public domain and as long as you reference a well known source on the subject then you shouldn't have any worries (Sorting, Searching)
When dealing with specific problems, especially in other people code, you have to read really carefully. If its published (book, journal, web, etc..) then you must always reference the original, at some point in your dissertation (technically once in then write up and then a comment in the source)
If it's other peoples work they deserve proper credit. Anything else is plagiarism
There's two aspects to this:
The citation requirements of your academic institution. You should make sure you comply with this because if you're found to have plagiarised another's work you can be guilty of academic misconduct and you don't want that; and
The ethics of using another's work. Barring "fair use" provisions (meaning there's only so much of someone's work you can reproduce before what you're doing is no longer "fair use") and the like, if you reproduce someone else's code, that you should credit. If you simply take the idea, that's possibly a bit different and is a judgement call. It depends on the significance of that idea and it's contribution to your work.
Outside of academic work, I make it a habit to leave a comment in my source code if I reference someone else to solve a particular problem. It's for my benefit as well, I might want to go back and take another look at their code months later and have forgotten where I originally found it. Of course, take a look at the included license when you reference someone's code, it should be pretty clear what you can't do with it.
There's no shame in borrowing working code if it's the best tool for the job, provided the author's licence conditions allow it - real programmers do it all the time. But for academic purposes there is a problem if you could be percieved as being at all secretive or deceptive about it. To avoid that, I'd reference it both in the code with a big, clear comment, and in the report. Full disclosure means you can't be accused of doing anything wrong.
I have the same issue for my dissertation as well:
I opt for referencing without overdoing it. Adding a reference to the referenced section, means that you've searched, found something, thought of using it and accepted it, while having nothing means that you made it up. This increases the value of your work in academical context (the more you base on previous work, the better).
Also your supervisor will be able to trace license and algorithmic issues on the referenced site.
At last you do not know where your work will get to. Suppose in 3 years after today someone tries to use it, and jumps into a patent violation? How can you help him - by a reference...

The best way to familiarize yourself with an inherited codebase

Stacker Nobody asked about the most shocking thing new programmers find as they enter the field.
Very high on the list, is the impact of inheriting a codebase with which one must rapidly become acquainted. It can be quite a shock to suddenly find yourself charged with maintaining N lines of code that has been clobbered together for who knows how long, and to have a short time in which to start contributing to it.
How do you efficiently absorb all this new data? What eases this transition? Is the only real solution to have already contributed to enough open-source projects that the shock wears off?
This also applies to veteran programmers. What techniques do you use to ease the transition into a new codebase?
I added the Community-Building tag to this because I'd also like to hear some war-stories about these transitions. Feel free to share how you handled a particularly stressful learning curve.
Pencil & Notebook ( don't get distracted trying to create a unrequested solution)
Make notes as you go and take an hour every monday to read thru and arrange the notes from previous weeks
with large codebases first impressions can be deceiving and issues tend to rearrange themselves rapidly while you are familiarizing yourself.
Remember the issues from your last work environment aren't necessarily valid or germane in your new environment. Beware of preconceived notions.
The notes/observations you make will help you learn quickly what questions to ask and of whom.
Hopefully you've been gathering the names of all the official (and unofficial) stakeholders.
One of the best ways to familiarize yourself with inherited code is to get your hands dirty. Start with fixing a few simple bugs and work your way into more complex ones. That will warm you up to the code better than trying to systematically review the code.
If there's a requirements or functional specification document (which is hopefully up-to-date), you must read it.
If there's a high-level or detailed design document (which is hopefully up-to-date), you probably should read it.
Another good way is to arrange a "transfer of information" session with the people who are familiar with the code, where they provide a presentation of the high level design and also do a walk-through of important/tricky parts of the code.
Write unit tests. You'll find the warts quicker, and you'll be more confident when the time comes to change the code.
Try to understand the business logic behind the code. Once you know why the code was written in the first place and what it is supposed to do, you can start reading through it, or as someone said, prolly fixing a few bugs here and there
My steps would be:
1.) Setup a source insight( or any good source code browser you use) workspace/project with all the source, header files, in the code base. Browsly at a higher level from the top most function(main) to lowermost function. During this code browsing, keep making notes on a paper/or a word document tracing the flow of the function calls. Do not get into function implementation nitti-gritties in this step, keep that for a later iterations. In this step keep track of what arguments are passed on to functions, return values, how the arguments that are passed to functions are initialized how the value of those arguments set modified, how the return values are used ?
2.) After one iteration of step 1.) after which you have some level of code and data structures used in the code base, setup a MSVC (or any other relevant compiler project according to the programming language of the code base), compile the code, execute with a valid test case, and single step through the code again from main till the last level of function. In between the function calls keep moting the values of variables passed, returned, various code paths taken, various code paths avoided, etc.
3.) Keep repeating 1.) and 2.) in iteratively till you are comfortable up to a point that you can change some code/add some code/find a bug in exisitng code/fix the bug!
-AD
I don't know about this being "the best way", but something I did at a recent job was to write a code spider/parser (in Ruby) that went through and built a call tree (and a reverse call tree) which I could later query. This was slightly non-trivial because we had PHP which called Perl which called SQL functions/procedures. Any other code-crawling tools would help in a similar fashion (i.e. javadoc, rdoc, perldoc, Doxygen etc.).
Reading any unit tests or specs can be quite enlightening.
Documenting things helps (either for yourself, or for other teammates, current and future). Read any existing documentation.
Of course, don't underestimate the power of simply asking a fellow teammate (or your boss!) questions. Early on, I asked as often as necessary "do we have a function/script/foo that does X?"
Go over the core libraries and read the function declarations. If it's C/C++, this means only the headers. Document whatever you don't understand.
The last time I did this, one of the comments I inserted was "This class is never used".
Do try to understand the code by fixing bugs in it. Do correct or maintain documentation. Don't modify comments in the code itself, that risks introducing new bugs.
In our line of work, generally speaking we do no changes to production code without good reason. This includes cosmetic changes; even these can introduce bugs.
No matter how disgusting a section of code seems, don't be tempted to rewrite it unless you have a bugfix or other change to do. If you spot a bug (or possible bug) when reading the code trying to learn it, record the bug for later triage, but don't attempt to fix it.
Another Procedure...
After reading Andy Hunt's "Pragmatic Thinking and Learning - Refactor Your Wetware" (which doesn't address this directly), I picked up a few tips that may be worth mentioning:
Observe Behavior:
If there's a UI, all the better. Use the app and get a mental map of relationships (e.g. links, modals, etc). Look at HTTP request if it helps, but don't put too much emphasis on it -- you just want a light, friendly acquaintance with app.
Acknowledge the Folder Structure:
Once again, this is light. Just see what belongs where, and hope that the structure is semantic enough -- you can always get some top-level information from here.
Analyze Call-Stacks, Top-Down:
Go through and list on paper or some other medium, but try not to type it -- this gets different parts of your brain engaged (build it out of Legos if you have to) -- function-calls, Objects, and variables that are closest to top-level first. Look at constants and modules, make sure you don't dive into fine-grained features if you can help it.
MindMap It!:
Maybe the most important step. Create a very rough draft mapping of your current understanding of the code. Make sure you run through the mindmap quickly. This allows an even spread of different parts of your brain to (mostly R-Mode) to have a say in the map.
Create clouds, boxes, etc. Wherever you initially think they should go on the paper. Feel free to denote boxes with syntactic symbols (e.g. 'F'-Function, 'f'-closure, 'C'-Constant, 'V'-Global Var, 'v'-low-level var, etc). Use arrows: Incoming array for arguments, Outgoing for returns, or what comes more naturally to you.
Start drawing connections to denote relationships. Its ok if it looks messy - this is a first draft.
Make a quick rough revision. Its its too hard to read, do another quick organization of it, but don't do more than one revision.
Open the Debugger:
Validate or invalidate any notions you had after the mapping. Track variables, arguments, returns, etc.
Track HTTP requests etc to get an idea of where the data is coming from. Look at the headers themselves but don't dive into the details of the request body.
MindMap Again!:
Now you should have a decent idea of most of the top-level functionality.
Create a new MindMap that has anything you missed in the first one. You can take more time with this one and even add some relatively small details -- but don't be afraid of what previous notions they may conflict with.
Compare this map with your last one and eliminate any question you had before, jot down new questions, and jot down conflicting perspectives.
Revise this map if its too hazy. Revise as much as you want, but keep revisions to a minimum.
Pretend Its Not Code:
If you can put it into mechanical terms, do so. The most important part of this is to come up with a metaphor for the app's behavior and/or smaller parts of the code. Think of ridiculous things, seriously. If it was an animal, a monster, a star, a robot. What kind would it be. If it was in Star Trek, what would they use it for. Think of many things to weigh it against.
Synthesis over Analysis:
Now you want to see not 'what' but 'how'. Any low-level parts that through you for a loop could be taken out and put into a sterile environment (you control its inputs). What sort of outputs are you getting. Is the system more complex than you originally thought? Simpler? Does it need improvements?
Contribute Something, Dude!:
Write a test, fix a bug, comment it, abstract it. You should have enough ability to start making minor contributions and FAILING IS OK :)! Note on any changes you made in commits, chat, email. If you did something dastardly, you guys can catch it before it goes to production -- if something is wrong, its a great way to get a teammate to clear things up for you. Usually listening to a teammate talk will clear a lot up that made your MindMaps clash.
In a nutshell, the most important thing to do is use a top-down fashion of getting as many different parts of your brain engaged as possible. It may even help to close your laptop and face your seat out the window if possible. Studies have shown that enforcing a deadline creates a "Pressure Hangover" for ~2.5 days after the deadline, which is why deadlines are often best to have on a Friday. So, BE RELAXED, THERE'S NO TIMECRUNCH, AND NOW PROVIDE YOURSELF WITH AN ENVIRONMENT THAT'S SAFE TO FAIL IN. Most of this can be fairly rushed through until you get down to details. Make sure that you don't bypass understanding of high-level topics.
Hope this helps you as well :)
All really good answers here. Just wanted to add few more things:
One can pair architectural understanding with flash cards and re-visiting those can solidify understanding. I find questions such as "Which part of code does X functionality ?", where X could be a useful functionality in your code base.
I also like to open a buffer in emacs and start re-writing some parts of the code base that I want to familiarize myself with and add my own comments etc.
One thing vi and emacs users can do is use tags. Tags are contained in a file ( usually called TAGS ). You generate one or more tags files by a command ( etags for emacs vtags for vi ). Then we you edit source code and you see a confusing function or variable you load the tags file and it will take you to where the function is declared ( not perfect by good enough ). I've actually written some macros that let you navigate source using Alt-cursor,
sort of like popd and pushd in many flavors of UNIX.
BubbaT
The first thing I do before going down into code is to use the application (as several different users, if necessary) to understand all the functionalities and see how they connect (how information flows inside the application).
After that I examine the framework in which the application was built, so that I can make a direct relationship between all the interfaces I have just seen with some View or UI code.
Then I look at the database and any database commands handling layer (if applicable), to understand how that information (which users manipulate) is stored and how it goes to and comes from the application
Finally, after learning where data comes from and how it is displayed I look at the business logic layer to see how data gets transformed.
I believe every application architecture can de divided like this and knowning the overall function (a who is who in your application) might be beneficial before really debugging it or adding new stuff - that is, if you have enough time to do so.
And yes, it also helps a lot to talk with someone who developed the current version of the software. However, if he/she is going to leave the company soon, keep a note on his/her wish list (what they wanted to do for the project but were unable to because of budget contraints).
create documentation for each thing you figured out from the codebase.
find out how it works by exprimentation - changing a few lines here and there and see what happens.
use geany as it speeds up the searching of commonly used variables and functions in the program and adds it to autocomplete.
find out if you can contact the orignal developers of the code base, through facebook or through googling for them.
find out the original purpose of the code and see if the code still fits that purpose or should be rewritten from scratch, in fulfillment of the intended purpose.
find out what frameworks did the code use, what editors did they use to produce the code.
the easiest way to deduce how a code works is by actually replicating how a certain part would have been done by you and rechecking the code if there is such a part.
it's reverse engineering - figuring out something by just trying to reengineer the solution.
most computer programmers have experience in coding, and there are certain patterns that you could look up if that's present in the code.
there are two types of code, object oriented and structurally oriented.
if you know how to do both, you're good to go, but if you aren't familiar with one or the other, you'd have to relearn how to program in that fashion to understand why it was coded that way.
in objected oriented code, you can easily create diagrams documenting the behaviors and methods of each object class.
if it's structurally oriented, meaning by function, create a functions list documenting what each function does and where it appears in the code..
i haven't done either of the above myself, as i'm a web developer it is relatively easy to figure out starting from index.php to the rest of the other pages how something works.
goodluck.