When to make class and function? - function

I am a beginner to programming when I start to code I just start writing and solve the problem.
I write whole program in a single main function.
I don't know when to make class, and functions.
What are good books which I read to learn these concepts?

A very general question, so just a few rules of thumb:
code reuse: when you have the same or very similar piece of code in two places, it should be moved to a function
readibility: if a function spans more than a single page on screen, you may want to break it apart into several functions
focus: every class or function should do only one specific task. Everything that is not core to this purpose should be delegated to other classes/functions.

I think the canonical answer here is that you should organize your code so that it's readable and maintainable. With that said, it's also important to consider the cost of organizing your code, and how long you expect your code to live.
More directly in response to your question: functions should be used to replace repetitive or otherwise well contained pieces of code. If you apply the same 10 operations over and over again on the same kinds of elements/data you might want to think about collecting all that information into a more concise and clear function. In general, a function needs well defined inputs and outputs.
Classes, in essence, collect functions and data together. Much like you should use a function to collect operations into concise, well defined collections of operations, classes should organize functions and data relevant to be stored together. That is, if you have a bunch of functions that operate on some things like a steering wheel, brakes, accelerators, etc. you should think about having a Vehicle class to organize these relevant functions and data/objects.
Beyond acting as an organizational element, classes should be used to enable easy reuse and creation of multiple "things" - suppose you wanted a collection of those Vehicles. Classes allow you to tie meaning or at least some semantics to your program.
The point of all this, though, is to make your life and the lives of others easier hen it comes to authoring and maintaining your program. So, by all means, when you need a solution to a problem in less than ten minutes and you think it's a one-time use program, ignore all this if you think it'll let you accomplish what you need to faster. Bear in mind, all this organization, semantics and ease of repetitve operation exists to make it easier to accomplish your objectives.

This is a stylistic and preference question and depending on how formal a place you work at it could be a matter of standards. I follow a couple of rules.
Classes
Sets of related data belong in classes together
Functions to operate on that data should be in the classes together
The classic Example is the Car class functions would be things like Drive and AddGass
Functions
If you are going to use it more then once it should be in a function
Most functions should be no more then one screen of code
Functions Should do one thing well not a bunch of things poorly
there Are a ton of opinions, but over time you must develop your own style.

It's actually very simple nicky!
The purpose of splitting code into methods is simply to allow its reuse. When you create a method you allow your program to invoke it at any time from several places instead of repeating the code again and again.
So every time you write lines and think... 'hey, I might need this functionality again somewhere in my program', then you need to put it in a method.
As for classes, you will try to group similar functionalities together. And try to keep classes short and simple. If you need several classes, you'll also group them in packages and so on.
When I write code, I usually have a pretty good idea what I'll be using again. But often I will start to write a few lines of code and realize that I wrote something quite similar in the past. So I'll find it and put it in a method then the two or more locations can now just invoke it. That is reuse at its best!
You can often use analyzers to find various metrics which will "put a grade" on your reuse and code duplication.
Happy learning!

Have a look at
Procedure, subroutine or function?, Object-oriented programming
An object is actually a discrete
bundle of functions and procedures,
all relating to a particular
real-world concept such as a bank
account holder or hockey player in a
computer game. Other pieces of
software can access the object only by
calling its functions and procedures
that have been allowed to be called by
outsiders.
Basically, you use a function/procedure/method to encapsulate a specific section of code that does a specific job, or for reusibility.
Classes are used to encapsulate/represent an object with possibly its own data, and specific function/procedure/method that makes sense to use with this object.
In some languages classes can be made static, with static function/procedure/method which can then be used as helper function/procedure/method

Just FYI, it'll become more evident when and why functions are useful as you progress to larger projects. I was a bit confused by their use when I first started too, when your entire program is only 20-50 lines of code which follows a very linear path, they don't make much sense. But when you start re-using tidbits of code, it makes sense to throw it in functions. Also makes it easier to read and follow the logic of your program if you only have to read descriptive function names, rather than deciphering what the next 5 lines of code are supposed to do.

I found myself asking this same question, and it led me to this post.
I think that one of the most confusing things about how OOP is explained to beginners is the idea that classes represent exactly what they sound like: classes of things, like Computer, Dog, Car, etc.
This is fine as far as it goes, but it's not strictly true, and the reality is much more abstract. Sometimes, classes don't really represent anything that could be considered a clearly defined abstraction of a group of things. Sometimes, they just organize stuff.
For this reason, I think "class" is really a misnomer, or at least misleading. A more relatable way to think about what a class is might be to simply think of it as a "group" or a "logical grouping."

Related

What programming practice that you once liked have you since changed your mind about? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
As we program, we all develop practices and patterns that we use and rely on. However, over time, as our understanding, maturity, and even technology usage changes, we come to realize that some practices that we once thought were great are not (or no longer apply).
An example of a practice I once used quite often, but have in recent years changed, is the use of the Singleton object pattern.
Through my own experience and long debates with colleagues, I've come to realize that singletons are not always desirable - they can make testing more difficult (by inhibiting techniques like mocking) and can create undesirable coupling between parts of a system. Instead, I now use object factories (typically with a IoC container) that hide the nature and existence of singletons from parts of the system that don't care - or need to know. Instead, they rely on a factory (or service locator) to acquire access to such objects.
My questions to the community, in the spirit of self-improvement, are:
What programming patterns or practices have you reconsidered recently, and now try to avoid?
What did you decide to replace them with?
//Coming out of university, we were taught to ensure we always had an abundance
//of commenting around our code. But applying that to the real world, made it
//clear that over-commenting not only has the potential to confuse/complicate
//things but can make the code hard to follow. Now I spend more time on
//improving the simplicity and readability of the code and inserting fewer yet
//relevant comments, instead of spending that time writing overly-descriptive
//commentaries all throughout the code.
Single return points.
I once preferred a single return point for each method, because with that I could ensure that any cleanup needed by the routine was not overlooked.
Since then, I've moved to much smaller routines - so the likelihood of overlooking cleanup is reduced and in fact the need for cleanup is reduced - and find that early returns reduce the apparent complexity (the nesting level) of the code. Artifacts of the single return point - keeping "result" variables around, keeping flag variables, conditional clauses for not-already-done situations - make the code appear much more complex than it actually is, make it harder to read and maintain. Early exits, and smaller methods, are the way to go.
Trying to code things perfectly on the first try.
Trying to create perfect OO model before coding.
Designing everything for flexibility and future improvements.
In one word overengineering.
Hungarian notation (both Forms and Systems).
I used to prefix everything. strSomeString or txtFoo.
Now I use someString and textBoxFoo. It's far more readable and easier for someone new to come along and pick up. As an added bonus, it's trivial to keep it consistant -- camelCase the control and append a useful/descriptive name. Forms Hungarian has the drawback of not always being consistent and Systems Hungarian doesn't really gain you much. Chunking all your variables together isn't really that useful -- especially with modern IDE's.
The "perfect" architecture
I came up with THE architecture a couple of years ago. Pushed myself technically as far as I could so there were 100% loosely coupled layers, extensive use of delegates, and lightweight objects. It was technical heaven.
And it was crap. The technical purity of the architecture just slowed my dev team down aiming for perfection over results and I almost achieved complete failure.
We now have much simpler less technically perfect architecture and our delivery rate has skyrocketed.
The use of caffine. It once kept me awake and in a glorious programming mood, where the code flew from my fingers with feverous fluidity. Now it does nothing, and if I don't have it I get a headache.
Commenting out code. I used to think that code was precious and that you can't just delete those beautiful gems that you crafted. I now delete any commented-out code I come across unless there's a TODO or NOTE attached because it's too perilous to leave it in. To wit, I've come across old classes with huge commented-out portions and it really confused me why they were there: were they recently commented out? is this a dev environment change? why does it do this unrelated block?
Seriously consider not commenting out code and just deleting it instead. If you need it, it's still in source control. YAGNI though.
The overuse / abuse of #region directives. It's just a little thing, but in C#, I previously would use #region directives all over the place, to organize my classes. For example, I'd group all class properties together in a region.
Now I look back at old code and mostly just get annoyed by them. I don't think it really makes things clearer most of the time, and sometimes they just plain slow you down.
So I have now changed my mind and feel that well laid out classes are mostly cleaner without region directives.
Waterfall development in general, and in specific, the practice of writing complete and comprehensive functional and design specifications that are somehow expected to be canonical and then expecting an implementation of those to be correct and acceptable. I've seen it replaced with Scrum, and good riddance to it, I say. The simple fact is that the changing nature of customer needs and desires makes any fixed specification effectively useless; the only way to really properly approach the problem is with an iterative approach. Not that Scrum is a silver bullet, of course; I've seen it misused and abused many, many times. But it beats waterfall.
Never crashing.
It seems like such a good idea, doesn't it? Users don't like programs that crash, so let's write programs that don't crash, and users should like the program, right? That's how I started out.
Nowadays, I'm more inclined to think that if it doesn't work, it shouldn't pretend it's working. Fail as soon as you can, with a good error message. If you don't, your program is going to crash even harder just a few instructions later, but with some nondescript null-pointer error that'll take you an hour to debug.
My favorite "don't crash" pattern is this:
public User readUserFromDb(int id){
User u = null;
try {
ResultSet rs = connection.execute("SELECT * FROM user WHERE id = " + id);
if (rs.moveNext()){
u = new User();
u.setFirstName(rs.get("fname"));
u.setSurname(rs.get("sname"));
// etc
}
} catch (Exception e) {
log.info(e);
}
if (u == null){
u = new User();
u.setFirstName("error communicating with database");
u.setSurname("error communicating with database");
// etc
}
u.setId(id);
return u;
}
Now, instead of asking your users to copy/paste the error message and sending it to you, you'll have to dive into the logs trying to find the log entry. (And since they entered an invalid user ID, there'll be no log entry.)
I thought it made sense to apply design patterns whenever I recognised them.
Little did I know that I was actually copying styles from foreign programming languages, while the language I was working with allowed for far more elegant or easier solutions.
Using multiple (very) different languages opened my eyes and made me realise that I don't have to mis-apply other people's solutions to problems that aren't mine. Now I shudder when I see the factory pattern applied in a language like Ruby.
Obsessive testing. I used to be a rabid proponent of test-first development. For some projects it makes a lot of sense, but I've come to realize that it is not only unfeasible, but rather detrimental to many projects to slavishly adhere to a doctrine of writing unit tests for every single piece of functionality.
Really, slavishly adhering to anything can be detrimental.
This is a small thing, but: Caring about where the braces go (on the same line or next line?), suggested maximum line lengths of code, naming conventions for variables, and other elements of style. I've found that everyone seems to care more about this than I do, so I just go with the flow of whoever I'm working with nowadays.
Edit: The exception to this being, of course, when I'm the one who cares the most (or is the one in a position to set the style for a group). In that case, I do what I want!
(Note that this is not the same as having no consistent style. I think a consistent style in a codebase is very important for readability.)
Perhaps the most important "programming practice" I have since changed my mind about, is the idea that my code is better than everyone else's. This is common for programmers (especially newbies).
Utility libraries. I used to carry around an assembly with a variety of helper methods and classes with the theory that I could use them somewhere else someday.
In reality, I just created a huge namespace with a lot of poorly organized bits of functionality.
Now, I just leave them in the project I created them in. In all probability I'm not going to need it, and if I do, I can always refactor them into something reusable later. Sometimes I will flag them with a //TODO for possible extraction into a common assembly.
Designing more than I coded.
After a while, it turns into analysis paralysis.
The use of a DataSet to perform business logic. This binds the code too tightly to the database, also the DataSet is usually created from SQL which makes things even more fragile. If the SQL or the Database changes it tends to trickle to everything the DataSet touches.
Performing any business logic inside an object constructor. With inheritance and the ability to create overloaded constructors tend to make maintenance difficult.
Abbreviating variable/method/table/... Names
I used to do this all of the time, even when working in languages with no enforced limits on lengths of names (well they were probably 255 or something). One of the side-effects were a lot of comments littered throughout the code explaining the (non-standard) abbreviations. And of course, if the names were changed for any reason...
Now I much prefer to call things what they really are, with good descriptive names. including standard abbreviations only. No need to include useless comments, and the code is far more readable and understandable.
Wrapping existing Data Access components, like the Enterprise Library, with a custom layer of helper methods.
It doesn't make anybody's life easier
Its more code that can have bugs in it
A lot of people know how to use the EntLib data access components. No one but the local team knows how to use the in house data access solution
I first heard about object-oriented programming while reading about Smalltalk in 1984, but I didn't have access to an o-o language until I used the cfront C++ compiler in 1992. I finally got to use Smalltalk in 1995. I had eagerly anticipated o-o technology, and bought into the idea that it would save software development.
Now, I just see o-o as one technique that has some advantages, but it's just one tool in the toolbox. I do most of my work in Python, and I often write standalone functions that are not class members, and I often collect groups of data in tuples or lists where in the past I would have created a class. I still create classes when the data structure is complicated, or I need behavior associated with the data, but I tend to resist it.
I'm actually interested in doing some work in Clojure when I get the time, which doesn't provide o-o facilities, although it can use Java objects if I understand correctly. I'm not ready to say anything like o-o is dead, but personally I'm not the fan I used to be.
In C#, using _notation for private members. I now think it's ugly.
I then changed to this.notation for private members, but found I was inconsistent in using it, so I dropped that too.
I stopped going by the university recommended method of design before implementation. Working in a chaotic and complex system has forced me to change attitude.
Of course I still do code research, especially when I'm about to touch code I've never touched before, but normally I try to focus on as small implementations as possible to get something going first. This is the primary goal. Then gradually refine the logic and let the design just appear by itself. Programming is an iterative process and works very well with an agile approach and with lots of refactoring.
The code will not look at all what you first thought it would look like. Happens every time :)
I used to be big into design-by-contract. This meant putting a lot of error checking at the beginning of all my functions. Contracts are still important, from the perspective of separation of concerns, but rather than try to enforce what my code shouldn't do, I try to use unit tests to verify what it does do.
I would use static's in a lot of methods/classes as it was more concise. When I started writing tests that practice changed very quickly.
Checked Exceptions
An amazing idea on paper - defines the contract clearly, no room for mistake or forgetting to check for some exception condition. I was sold when I first heard about it.
Of course, it turned to be such a mess in practice. To the point of having libraries today like Spring JDBC, which has hiding legacy checked exceptions as one of its main features.
That anything worthwhile was only coded in one particular language. In my case I believed that C was the best language ever and I never had any reason to code anything in any other language... ever.
I have since come to appreciate many different languages and the benefits/functionality they offer. If I want to code something small - quickly - I would use Python. If I want to work on a large project I would code in C++ or C#. If I want to develop a brain tumour I would code in Perl.
When I needed to do some refactoring, I thought it was faster and cleaner to start straightaway and implement the new design, fixing up the connections until they work. Then I realized it's better to do a series of small refactorings to slowly but reliably progress towards the new design.
Perhaps the biggest thing that has changed in my coding practices, as well as in others, is the acceptance of outside classes and libraries downloaded from the internet as the basis for behaviors and functionality in applications. In school at the time I attended college we were encouraged to figure out how to make things better via our own code and rely upon the language to solve our problems. With the advances in all aspects of user interface and service/data consumption this is no longer a realistic notion.
There are certain things which will never change in a language, and having a library that wraps this code in a simpler transaction and in fewer lines of code that I have to write is a blessing. Connecting to a database will always be the same. Selecting an element within the DOM will not change. Sending an email via a server-side script will never change. Having to write this time and again wastes time that I could be using to improve my core logic in the application.
Initializing all class members.
I used to explicitly initialize every class member with something, usually NULL. I have come to realize that this:
normally means that every variable is initialized twice before ever being read
is silly because in most languages automatically initialize variables to NULL.
actually enforces a slight performance hit in most languages
can bloat code on larger projects
Like you, I also have embraced IoC patterns in reducing coupling between various components of my apps. It makes maintenance and parts-swapping much simpler, as long as I can keep each component as independent as possible. I'm also utilizing more object-relational frameworks such as NHibernate to simplify database management chores.
In a nutshell, I'm using "mini" frameworks to aid in building software more quickly and efficiently. These mini-frameworks save lots of time, and if done right can make an application super simple to maintain down the road. Plug 'n Play for the win!

How to partition a problem into smaller understandable portions?

I'm not sure if it's possible to give general advice on this topic, but please try. It's hard to explain my case because it's too complex to explain. And that's exactly the problem.
I seem to constantly stumble on a situation where I try to design some part of my project, but it has so many things to take into consideration that I'm unable to get a grasp of it.
Are there any general tips or advice on how to look at my system in smaller pieces at a time? How to find smaller portions that could be designed separately on their own?
Create a glossary.
In other words, identify the terms that are meaningful to the project domain — not from the programmer's point of view, but from a user's, who is familiar with the subject matter.
Then define the terms as precisely and discretely as you can. A good definition in this form can serve as a kind of pseudocode.
Since you have not identified even the domain of your problem, I'll choose a random example. In a civilian personnel system, you might have terms like:
billet: a term of service (from start date to end date) at a particular grade and step
employee: a series of billets associated with a particular SSN
grade and step: row and column in the federal general schedule
And so on. This isn't to identify functional units, as it sounds like you are trying to do, but it's a good preparatory step before doing so, so that you can express your functional steps in well-defined terms.
Your key goals are:
High cohesion: Code (methods, fields, classes) within one piece/module/partition should interact intensively; it should make sense for these elements to know about each other. If you find that some of them don't interact much with the rest, they probably belong somwhere else or should form their own partition. If you find code outside interacting intensively with the partition and knowing too much about its inner workings, it probably belongs inside. The typical example is found in OO code written in procedural style, with "dumb" data objects and "manager" code that operates on them but should really be part of the data objects.
Loose coupling: Interaction between pieces/modules/partitions should only happen through narrow, well-defined, well-documented APIs. Try to identify such APIs and see what code is needed to implement them and what code will use them.
It's useful to approach problem decomposition both top-down and bottom-up.
If you're having trouble splitting a big problem into two or more smaller problems, try to think of the smallest possible problems that will need to be solved. Once those are handled, you may start to see ways to combine them into larger problems as you approach your original large problem.
When I find myself copying and pasting chunks of code with minimal adjustments I realize that's a "partition" and then create a class, method, function, or whatever.
Actually, the whole object oriented approach is what it's all about. Try thinking of your application as tangible things that do stuff. Write pseudo code describing what the things are and what they do, I find lots of "partitions" this way.
Here's a try, kind of wild guess.
People usually underestimate how long it will take them to do the work. If your project is large, then most likely you'll need several people to work on it, so you can try planning with that in mind. Now a person can be expected to hold just one area in the head, so you'll need to explain to him exactly what kind of task he's supposed to do.
So I'd say you should try to write a job description that should encompass as much as possible for one person to seriously concentrate on. Repeat, until you have broken your project into parts you wanted to. As a benefit, you're ready to assemble your team. But if you find out the parts are small, maybe you'll still be able to do it yourself.

Class member organization

What is the best way to sort class members?
I'm in conflict with a team member about this. He suggests that we should sort the members alphabetically. I think it's better to organize in a semantic manner: important attributes first, related methods together, etc.
What do you think?
I like semantic. Alphabetical doesn't seem to make a lot of sense to me, cause when you're looking for a member, you rarely know exactly what it's called. Also, if you're using any sort of naming convention (eg: Hungarian), alphabetical is going to lead to grouping by type, which may not be what you want.
Group related class members together. I think this will help other programmers understand your interface more easily when they see it for the first time.
Some also find it helpful to organize accessors and modifiers together in separate sections.
I've studied this exact issue as part of my master's thesis.
An alphabetical organization or organization based on public/private is better for being able to find specific things. However, in some IDEs you can set the outline tool to sort alphabetically and to use special indicators for public/private.
My approach was to group methods based on what members they use: there is often a conceptual connection between methods that use the same fields.
I actually created a visualization from that, which helped to quickly navigate and understand the structure of huge classes.
I never look for a member by going through the code. When I want to jump to a member definition, I either select it from the navigation bar / document outline / class view, or I right-click and select "Jump to definition". You don't need to sort the members if you have a decent IDE. This works very good in Visual Studio and the other IDE I use if needed, KDevelop, supports at least the basics of this.
Anyway, I tend to group members by functionality, i.e. all fields / properties / methods that are part of some specific functionality are together. And since classes shouldn't be too long, this is enough.
This is just my opinion, which I am sure will be unpopular, but the problem with semantic sorting is its subjective. Each person will have a different opinion of what methods should be close together.
Alphabetical has the advantage that it is entirely objective. It also reduces large diffs for small changes, which is common when one coder chooses a different semantic ordering.
Most IDEs have outlines, or hyperlinks to make navigation easier.
EDIT: A clarification- I still sort by public first to private, but alphabetical within the same access level. In fact, I don't do any sorting - I let my IDE resort the file for me when saving.
Are you writing a phone book?
With a semantic approach you can easily show what are the most important methods.
I generally go with Constructor, Destructor first, then important methods followed by getters and setters and eventually misc. methods. Finally, I take a similar approach for internal parts (private methods, attributes...).
Alphabetical order does not convey any useful information about your class. If you really want to see methods sorted alphabetically, you should rely on a function of your IDE.
You can go from semantic to alphabetic by sorting the "methods display" in your IDE.
You can't go (automatically) from alphabetic to semantic.
Hence: semantic.
Assuming you are using a modern IDE, finding the method you want is rarely more than two mouse clicks away, so I am not sure what having a particular way of organizing your methods would get you. I do use stylecop (http://code.msdn.microsoft.com/sourceanalysis) which has me ordering by public / private / method / properties - I have found that to be anal enough.
The only time I ever truely thought this was important is when I wrote a very large jscript program and the editor at the time didn't offer any help in finding functions. Alphabetic organization was very helpful. When alphabatized, it isn't hard to figure out which way in the file you need to go to find a method. Semantic organization would have been completely unhelpful.
At a higher level, I would organize my class this way:
Constructor
Destructor
Private Fields
Properties
Methods/Functions
Then for methods/functions I would break it down again by functionality. e.g. I would put methods that implement an interface into one region, I would put event handlers methods into one region, etc...
RWendi
I guess I'm one of the oddball cases who favors alphabetical listings.
First and foremost, it's been my experience that grouping methods together "semantically" tends to be a time-sink. Now, if we're talking about grouping them by scope/visibility, that's another thing. But then, if the member changes it's scope, you have to sink time into moving the member to keep the code current. I don't want to have to waste time shuffling code around to observe a guideline like that.
I am also not a big fan of regions. When properties and methods are grouped by scope, they tend to shout out for enclosure in a region. But enclosing code in collapsing regions tends to hide badly written code. As long as you don't have to look at it, you won't be bothered to think about refactoring it to make it maintainable.
So, I favor alphabetical organization. It's simple, direct, and to the point. I'm not tempted to enclose groups into regions. And since the IDE makes it easy to leap to a function or property definition anyway, the physical layout of the code is moot. It used to be that you wanted folks to focus on your public members first. Modern IDEs make that largely a pointless argument in favor of scope-based layouts.
But the biggest advantage of alphabetical layouts is this: printed code samples during code reviews. And I use them alot. It makes finding a function or a property a snap. If you've ever had to wade through a lot of code to find a function or a property when things weren't just alphabetically listed, you'll know what I'm talking about.
But, as they say, those are my subjective views on the subject. Your mileage may vary.

Best practices: Many small functions/methods, or bigger functions with logical process components inline? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Is it better to write many small methods (or functions), or to simply write the logic/code of those small processes right into the place where you would have called the small method? What about breaking off code into a small function even if for the time being it is only called from one spot?
If one's choice depends on some criteria, what are they; how should a programmer make a good judgement call?
I'm hoping the answer can be applied generally across many languages, but if necessary, answers given can be specific to a language or languages. In particular, I'm thinking of SQL (functions, rules and stored procedures), Perl, PHP, Javascript and Ruby.
I always break long methods up into logical chunks and try to make smaller methods out of them. I don't normally turn a few lines into a separate method until I need it in two different places, but sometimes I do just to help readability, or if I want to test it in isolation.
Fowler's Refactoring is all about this topic, and I highly recommend it.
Here's a handy rule of thumb that I use from Refactoring. If a section of code has a comment that I could re-word into a method name, pull it out and make it a method.
The size of the method is directly linked to its cyclomatic complexity.
The main advantages to keep the size of the method small (which means dividing a big method into several small methods) are:
better unit testing (due to low cyclomatic complexity)
better debugging due to a more explicit stack trace (instead of one error within one giant method)
As always you can say: it depends. It's more a question of naming and defining the task of a method. Every method should do one (not more) well defined task and should do them completely. The name of the method should indicate the task. If your method is named DoAandB() it may be better to have separate methods DoA() and DoB(). If you need methods like setupTask, executeTask, FinishTask, it may be useful to combine them.
Some points that indicate, that a merge of different methods may be useful:
A method cannot be used alone, without the use of other methods.
You have to be careful to call some dependent methods in the right order.
Some points that indicate, that a splitup of the method could be useful:
Some lines of the existing method have clear independent task.
Unit-testing of the big method gets problematic. If tests are easier to write for independent methods, then split the big method up.
As an explanation to the unit-test-argument: I wrote a method, that did some things including IO. The IO-part was very hard to test, so I thought about it. I came to the conclusion, that my method did 5 logical and independent steps, and only one of them involved the IO. So I split up my method into 5 smaller ones, four of them were easy to test.
Small methods every time.
They are self documenting (er, if well named)
They break down the problem into manageable parts - you are KeepingItSimple.
You can use OO techniques to more easily (and obviously) plug in behaviour. The large method is by definition more procedural and so less flexible.
They are unit testable. This is the killer, you simply can’t unit test some huge method that performs a load of tasks
Something I learnt from The Code Complete book:
Write methods/functions so that it
implement one chunk(or unit or task)
of logic. If that requires breakdown
into sub tasks, then write a
seperate method/function for them
and call them.
If I find that the method/function
name is getting long then I try to
examine the method to see it it can
be broken down into two methods.
Hope this helps
Some rules of thumb:
Functions should not be longer than what can be displayed on screen
Break functions into smaller ones if it makes the code more readable.
I make each function do one thing, and one thing only, and I try not to nest too many levels of logic. Once you start breaking your code down into well named functions, it becomes a lot easier to read, and practically self-documenting.
I find that having many small methods makes code easier to read, maintain and debug.
When I'm reading through a unit that implements some business logic, I can better follow the flow if I see a series of method calls that describe the process. If I care about how the method is implemented, I can go look in the code.
It feels like more work but it ultimately saves time.
There is an art, I think, to knowing what to encapsulate. Everyone has some slight difference of opinion. If I could put it in words I'd say that each method should do one thing that can be described as a complete task.
The bigger the method, the harder to test and maintain. I find its much easier to understand how a large process works when its broken down into atomic steps. Also, doing this is a great first step to make your classes extensible. You can mark those individual steps as virtual (for inheritance), or move them into other objects (composition), making your application's behavior easier to customize.
I usually go for splitting functions into smaller functions that each perform a single, atomic task, but only if that function is complex enough to warrent it.
This way, I don't end up with multiple functions for simple tasks, and the functions I do extract can typically be used elsewhere as they don't try to achieve too much. This also aids unit testing as each function (as a logical, atomic action) can then be tested individually.
It depends a bit ... on mindset. Still, this is not an opinionated question.
The answer rather actually depends on the language context.
In a Java/C#/C++ world, where people are following the "Clean Code" school, as preached by Robert Martin, then: many small methods are the way to go.
A method has a clear name, and does one thing. One level of nesting, that's it. That limits its length to 3, 5, max 10 lines.
And honestly: I find this way of coding absolutely superior to any other "style".
The only downside of this approach is that you end up with many small methods, so ordering within a file/class can become an issue. But the answer to that is to use a decent IDE that allows to easily navigate forth and back.
So, the only "legit" reason to use the "all stuff goes into one method/function" is when your whole team works like that, and prefers that style. Or when you can't use decent tooling (but then navigating that big ugly function won't work either).
Personally, I lean significantly in the direction of preferring more, smaller methods, but not to the point of religiously aiming for a maximum line count. My primary criterion or goal is to keep my code DRY. The minute I have a code block which is duplicated (whether in spirit or actually by the text), even if it might be 2 or 4 lines long, I DRY up that code into a separate method. Sometimes I will do so in advance if I think there's a good chance it will be used again in the future.
On the flip side, I have also heard it argued that if your break-off method is too small, in the context of a team of developers, a teammate is likely not to know about your method, and will either write inline, or write his own small method that does the same thing. This is admittedly a bad situation.
Some also try to argue that it is more readable to keep things inline, so a reader can just read top-down, instead of having to jump around method definitions, possibly across multiple files. Personally, I think the existence of a stack trace makes this not much of an issue.

The best way to familiarize yourself with an inherited codebase

Stacker Nobody asked about the most shocking thing new programmers find as they enter the field.
Very high on the list, is the impact of inheriting a codebase with which one must rapidly become acquainted. It can be quite a shock to suddenly find yourself charged with maintaining N lines of code that has been clobbered together for who knows how long, and to have a short time in which to start contributing to it.
How do you efficiently absorb all this new data? What eases this transition? Is the only real solution to have already contributed to enough open-source projects that the shock wears off?
This also applies to veteran programmers. What techniques do you use to ease the transition into a new codebase?
I added the Community-Building tag to this because I'd also like to hear some war-stories about these transitions. Feel free to share how you handled a particularly stressful learning curve.
Pencil & Notebook ( don't get distracted trying to create a unrequested solution)
Make notes as you go and take an hour every monday to read thru and arrange the notes from previous weeks
with large codebases first impressions can be deceiving and issues tend to rearrange themselves rapidly while you are familiarizing yourself.
Remember the issues from your last work environment aren't necessarily valid or germane in your new environment. Beware of preconceived notions.
The notes/observations you make will help you learn quickly what questions to ask and of whom.
Hopefully you've been gathering the names of all the official (and unofficial) stakeholders.
One of the best ways to familiarize yourself with inherited code is to get your hands dirty. Start with fixing a few simple bugs and work your way into more complex ones. That will warm you up to the code better than trying to systematically review the code.
If there's a requirements or functional specification document (which is hopefully up-to-date), you must read it.
If there's a high-level or detailed design document (which is hopefully up-to-date), you probably should read it.
Another good way is to arrange a "transfer of information" session with the people who are familiar with the code, where they provide a presentation of the high level design and also do a walk-through of important/tricky parts of the code.
Write unit tests. You'll find the warts quicker, and you'll be more confident when the time comes to change the code.
Try to understand the business logic behind the code. Once you know why the code was written in the first place and what it is supposed to do, you can start reading through it, or as someone said, prolly fixing a few bugs here and there
My steps would be:
1.) Setup a source insight( or any good source code browser you use) workspace/project with all the source, header files, in the code base. Browsly at a higher level from the top most function(main) to lowermost function. During this code browsing, keep making notes on a paper/or a word document tracing the flow of the function calls. Do not get into function implementation nitti-gritties in this step, keep that for a later iterations. In this step keep track of what arguments are passed on to functions, return values, how the arguments that are passed to functions are initialized how the value of those arguments set modified, how the return values are used ?
2.) After one iteration of step 1.) after which you have some level of code and data structures used in the code base, setup a MSVC (or any other relevant compiler project according to the programming language of the code base), compile the code, execute with a valid test case, and single step through the code again from main till the last level of function. In between the function calls keep moting the values of variables passed, returned, various code paths taken, various code paths avoided, etc.
3.) Keep repeating 1.) and 2.) in iteratively till you are comfortable up to a point that you can change some code/add some code/find a bug in exisitng code/fix the bug!
-AD
I don't know about this being "the best way", but something I did at a recent job was to write a code spider/parser (in Ruby) that went through and built a call tree (and a reverse call tree) which I could later query. This was slightly non-trivial because we had PHP which called Perl which called SQL functions/procedures. Any other code-crawling tools would help in a similar fashion (i.e. javadoc, rdoc, perldoc, Doxygen etc.).
Reading any unit tests or specs can be quite enlightening.
Documenting things helps (either for yourself, or for other teammates, current and future). Read any existing documentation.
Of course, don't underestimate the power of simply asking a fellow teammate (or your boss!) questions. Early on, I asked as often as necessary "do we have a function/script/foo that does X?"
Go over the core libraries and read the function declarations. If it's C/C++, this means only the headers. Document whatever you don't understand.
The last time I did this, one of the comments I inserted was "This class is never used".
Do try to understand the code by fixing bugs in it. Do correct or maintain documentation. Don't modify comments in the code itself, that risks introducing new bugs.
In our line of work, generally speaking we do no changes to production code without good reason. This includes cosmetic changes; even these can introduce bugs.
No matter how disgusting a section of code seems, don't be tempted to rewrite it unless you have a bugfix or other change to do. If you spot a bug (or possible bug) when reading the code trying to learn it, record the bug for later triage, but don't attempt to fix it.
Another Procedure...
After reading Andy Hunt's "Pragmatic Thinking and Learning - Refactor Your Wetware" (which doesn't address this directly), I picked up a few tips that may be worth mentioning:
Observe Behavior:
If there's a UI, all the better. Use the app and get a mental map of relationships (e.g. links, modals, etc). Look at HTTP request if it helps, but don't put too much emphasis on it -- you just want a light, friendly acquaintance with app.
Acknowledge the Folder Structure:
Once again, this is light. Just see what belongs where, and hope that the structure is semantic enough -- you can always get some top-level information from here.
Analyze Call-Stacks, Top-Down:
Go through and list on paper or some other medium, but try not to type it -- this gets different parts of your brain engaged (build it out of Legos if you have to) -- function-calls, Objects, and variables that are closest to top-level first. Look at constants and modules, make sure you don't dive into fine-grained features if you can help it.
MindMap It!:
Maybe the most important step. Create a very rough draft mapping of your current understanding of the code. Make sure you run through the mindmap quickly. This allows an even spread of different parts of your brain to (mostly R-Mode) to have a say in the map.
Create clouds, boxes, etc. Wherever you initially think they should go on the paper. Feel free to denote boxes with syntactic symbols (e.g. 'F'-Function, 'f'-closure, 'C'-Constant, 'V'-Global Var, 'v'-low-level var, etc). Use arrows: Incoming array for arguments, Outgoing for returns, or what comes more naturally to you.
Start drawing connections to denote relationships. Its ok if it looks messy - this is a first draft.
Make a quick rough revision. Its its too hard to read, do another quick organization of it, but don't do more than one revision.
Open the Debugger:
Validate or invalidate any notions you had after the mapping. Track variables, arguments, returns, etc.
Track HTTP requests etc to get an idea of where the data is coming from. Look at the headers themselves but don't dive into the details of the request body.
MindMap Again!:
Now you should have a decent idea of most of the top-level functionality.
Create a new MindMap that has anything you missed in the first one. You can take more time with this one and even add some relatively small details -- but don't be afraid of what previous notions they may conflict with.
Compare this map with your last one and eliminate any question you had before, jot down new questions, and jot down conflicting perspectives.
Revise this map if its too hazy. Revise as much as you want, but keep revisions to a minimum.
Pretend Its Not Code:
If you can put it into mechanical terms, do so. The most important part of this is to come up with a metaphor for the app's behavior and/or smaller parts of the code. Think of ridiculous things, seriously. If it was an animal, a monster, a star, a robot. What kind would it be. If it was in Star Trek, what would they use it for. Think of many things to weigh it against.
Synthesis over Analysis:
Now you want to see not 'what' but 'how'. Any low-level parts that through you for a loop could be taken out and put into a sterile environment (you control its inputs). What sort of outputs are you getting. Is the system more complex than you originally thought? Simpler? Does it need improvements?
Contribute Something, Dude!:
Write a test, fix a bug, comment it, abstract it. You should have enough ability to start making minor contributions and FAILING IS OK :)! Note on any changes you made in commits, chat, email. If you did something dastardly, you guys can catch it before it goes to production -- if something is wrong, its a great way to get a teammate to clear things up for you. Usually listening to a teammate talk will clear a lot up that made your MindMaps clash.
In a nutshell, the most important thing to do is use a top-down fashion of getting as many different parts of your brain engaged as possible. It may even help to close your laptop and face your seat out the window if possible. Studies have shown that enforcing a deadline creates a "Pressure Hangover" for ~2.5 days after the deadline, which is why deadlines are often best to have on a Friday. So, BE RELAXED, THERE'S NO TIMECRUNCH, AND NOW PROVIDE YOURSELF WITH AN ENVIRONMENT THAT'S SAFE TO FAIL IN. Most of this can be fairly rushed through until you get down to details. Make sure that you don't bypass understanding of high-level topics.
Hope this helps you as well :)
All really good answers here. Just wanted to add few more things:
One can pair architectural understanding with flash cards and re-visiting those can solidify understanding. I find questions such as "Which part of code does X functionality ?", where X could be a useful functionality in your code base.
I also like to open a buffer in emacs and start re-writing some parts of the code base that I want to familiarize myself with and add my own comments etc.
One thing vi and emacs users can do is use tags. Tags are contained in a file ( usually called TAGS ). You generate one or more tags files by a command ( etags for emacs vtags for vi ). Then we you edit source code and you see a confusing function or variable you load the tags file and it will take you to where the function is declared ( not perfect by good enough ). I've actually written some macros that let you navigate source using Alt-cursor,
sort of like popd and pushd in many flavors of UNIX.
BubbaT
The first thing I do before going down into code is to use the application (as several different users, if necessary) to understand all the functionalities and see how they connect (how information flows inside the application).
After that I examine the framework in which the application was built, so that I can make a direct relationship between all the interfaces I have just seen with some View or UI code.
Then I look at the database and any database commands handling layer (if applicable), to understand how that information (which users manipulate) is stored and how it goes to and comes from the application
Finally, after learning where data comes from and how it is displayed I look at the business logic layer to see how data gets transformed.
I believe every application architecture can de divided like this and knowning the overall function (a who is who in your application) might be beneficial before really debugging it or adding new stuff - that is, if you have enough time to do so.
And yes, it also helps a lot to talk with someone who developed the current version of the software. However, if he/she is going to leave the company soon, keep a note on his/her wish list (what they wanted to do for the project but were unable to because of budget contraints).
create documentation for each thing you figured out from the codebase.
find out how it works by exprimentation - changing a few lines here and there and see what happens.
use geany as it speeds up the searching of commonly used variables and functions in the program and adds it to autocomplete.
find out if you can contact the orignal developers of the code base, through facebook or through googling for them.
find out the original purpose of the code and see if the code still fits that purpose or should be rewritten from scratch, in fulfillment of the intended purpose.
find out what frameworks did the code use, what editors did they use to produce the code.
the easiest way to deduce how a code works is by actually replicating how a certain part would have been done by you and rechecking the code if there is such a part.
it's reverse engineering - figuring out something by just trying to reengineer the solution.
most computer programmers have experience in coding, and there are certain patterns that you could look up if that's present in the code.
there are two types of code, object oriented and structurally oriented.
if you know how to do both, you're good to go, but if you aren't familiar with one or the other, you'd have to relearn how to program in that fashion to understand why it was coded that way.
in objected oriented code, you can easily create diagrams documenting the behaviors and methods of each object class.
if it's structurally oriented, meaning by function, create a functions list documenting what each function does and where it appears in the code..
i haven't done either of the above myself, as i'm a web developer it is relatively easy to figure out starting from index.php to the rest of the other pages how something works.
goodluck.