Recently our company has started measuring the cyclomatic complexity (CC) of the functions in our code on a weekly basis, and reporting which functions have improved or worsened. So we have started paying a lot more attention to the CC of functions.
I've read that CC could be informally calculated as 1 + the number of decision points in a function (e.g. if statement, for loop, select etc), or also the number of paths through a function...
I understand that the easiest way of reducing CC is to use the Extract Method refactoring repeatedly...
There are somethings I am unsure about, e.g. what is the CC of the following code fragments?
1)
for (int i = 0; i < 3; i++)
Console.WriteLine("Hello");
And
Console.WriteLine("Hello");
Console.WriteLine("Hello");
Console.WriteLine("Hello");
They both do the same thing, but does the first version have a higher CC because of the for statement?
2)
if (condition1)
if (condition2)
if (condition 3)
Console.WriteLine("wibble");
And
if (condition1 && condition2 && condition3)
Console.WriteLine("wibble");
Assuming the language does short-circuit evaluation, such as C#, then these two code fragments have the same effect... but is the CC of the first fragment higher because it has 3 decision points/if statements?
3)
if (condition1)
{
Console.WriteLine("one");
if (condition2)
Console.WriteLine("one and two");
}
And
if (condition3)
Console.WriteLine("fizz");
if (condition4)
Console.WriteLine("buzz");
These two code fragments do different things, but do they have the same CC? Or does the nested if statement in the first fragment have a higher CC? i.e. nested if statements are mentally more complex to understand, but is that reflected in the CC?
Yes. Your first example has a decision point and your second does not, so the first has a higher CC.
Yes-maybe, your first example has multiple decision points and thus a higher CC. (See below for explanation.)
Yes-maybe. Obviously they have the same number of decision points, but there are different ways to calculate CC, which means ...
... if your company is measuring CC in a specific way, then you need to become familiar with that method (hopefully they are using tools to do this). There are different ways to calculate CC for different situations (case statements, Boolean operators, etc.), but you should get the same kind of information from the metric no matter what convention you use.
The bigger problem is what others have mentioned, that your company seems to be focusing more on CC than on the code behind it. In general, sure, below 5 is great, below 10 is good, below 20 is okay, 21 to 50 should be a warning sign, and above 50 should be a big warning sign, but those are guides, not absolute rules. You should probably examine the code in a procedure that has a CC above 50 to ensure it isn't just a huge heap of code, but maybe there is a specific reason why the procedure is written that way, and it's not feasible (for any number of reasons) to refactor it.
If you use tools to refactor your code to reduce CC, make sure you understand what the tools are doing, and that they're not simply shifting one problem to another place. Ultimately, you want your code to have few defects, to work properly, and to be relatively easy to maintain. If that code also has a low CC, good for it. If your code meets these criteria and has a CC above 10, maybe it's time to sit down with whatever management you can and defend your code (and perhaps get them to examine their policy).
After browsing thru the wikipedia entry and on Thomas J. McCabe's original paper, it seems that the items you mentioned above are known problems with the metric.
However, most metrics do have pros and cons. I suppose in a large enough program the CC value could point to possibly complex parts of your code. But that higher CC does not necessarily mean complex.
Like all software metrics, CC is not perfect. Used on a big enough code base, it can give you an idea of where might be a problematic zone.
There are two things to keep in mind here:
Big enough code base: In any non trivial project you will have functions that have a really high CC value. So high that it does not matter if in one of your examples, the CC would be 2 or 3. A function with a CC of let's say over 300 is definitely something to analyse. Doesn't matter if the CC is 301 or 302.
Don't forget to use your head. There are methods that need many decision points. Often they can be refactored somehow to have fewer, but sometimes they can't. Do not go with a rule like "Refactor all methods with a CC > xy". Have a look at them and use your brain to decide what to do.
I like the idea of a weekly analysis. In quality control, trend analysis is a very effective tool for indentifying problems during their creation. This is so much better than having to wait until they get so big that they become obvious (see SPC for some details).
CC is not a panacea for measuring quality. Clearly a repeated statement is not "better" than a loop, even if a loop has a bigger CC. The reason the loop has a bigger CC is that sometimes it might get executed and sometimes it might not, which leads to two different "cases" which should both be tested. In your case the loop will always be executed three times because you use a constant, but CC is not clever enough to detect this.
Same with the chained ifs in example 2 - this structure allows you to have a statment which would be executed if only condition1 and condition2 is true. This is a special case which is not possible in the case using &&. So the if-chain has a bigger potential for special cases even if you dont utilize this in your code.
This is the danger of applying any metric blindly. The CC metric certainly has a lot of merit but as with any other technique for improving code it can't be evaluated divorced from context. Point your management at Casper Jone's discussion of the Lines of Code measurement (wish I could find a link for you). He points out that if Lines of Code is a good measure of productivity then assembler language developers are the most productive developers on earth. Of course they're no more productive than other developers; it just takes them a lot more code to accomplish what higher level languages do with less source code. I mention this, as I say, so you can show your managers how dumb it is to blindly apply metrics without intelligent review of what the metric is telling you.
I would suggest that if they're not, that your management would be wise to use the CC measure as a way of spotting potential hot spots in the code that should be reviewed further. Blindly aiming for the goal of lower CC without any reference to code maintainability or other measures of good coding is just foolish.
Cyclomatic complexity is analogous to temperature. They are both measurements, and in most cases meaningless without context. If I said the temperature outside was 72 degrees that doesn’t mean much; but if I added the fact that I was at North Pole, the number 72 becomes significant. If someone told me a method has a cyclomatic complexity of 10, I can’t determine if that is good or bad without its context.
When I code review an existing application, I find cyclomatic complexity a useful “starting point” metric. The first thing I check for are methods with a CC > 10. These “>10” methods are not necessarily bad. They just provide me a starting point for reviewing the code.
General rules when considering a CC number:
The relationship between CC # and # of tests, should be CC# <= #tests
Refactor for CC# only if it increases
maintainability
CC above 10 often indicates one or more Code Smells
[Off topic] If you favor readability over good score in the metrics (Was it J.Spolsky that said, "what's measured, get's done" ? - meaning that metrics are abused more often than not I suppose), it is often better to use a well-named boolean to replace your complex conditional statement.
then
if (condition1 && condition2 && condition3)
Console.WriteLine("wibble");
become
bool/boolean theWeatherIsFine = condition1 && condition2 && condition3;
if (theWeatherIsFine)
Console.WriteLine("wibble");
I'm no expert at this subject, but I thought I would give my two cents. And maybe that's all this is worth.
Cyclomatic Complexity seems to be just a particular automated shortcut to finding potentially (but not definitely) problematic code snippets. But isn't the real problem to be solved one of testing? How many test cases does the code require? If CC is higher, but number of test cases is the same and code is cleaner, don't worry about CC.
1.) There is no decision point there. There is one and only one path through the program there, only one possible result with either of the two versions. The first is more concise and better, Cyclomatic Complexity be damned.
1 test case for both
2.) In both cases, you either write "wibble" or you don't.
2 test cases for both
3.) First one could result in nothing, "one", or "one" and "one and two". 3 paths. 2nd one could result in nothing, either of the two, or both of them. 4 paths.
3 test cases for the first
4 test cases for the second
Related
Why would a language NOT use Short-circuit evaluation? Are there any benefits of not using it?
I see that it could lead to some performances issues... is that true? Why?
Related question : Benefits of using short-circuit evaluation
Reasons NOT to use short-circuit evaluation:
Because it will behave differently and produce different results if your functions, property Gets or operator methods have side-effects. And this may conflict with: A) Language Standards, B) previous versions of your language, or C) the default assumptions of your languages typical users. These are the reasons that VB has for not short-circuiting.
Because you may want the compiler to have the freedom to reorder and prune expressions, operators and sub-expressions as it sees fit, rather than in the order that the user typed them in. These are the reasons that SQL has for not short-circuiting (or at least not in the way that most developers coming to SQL think it would). Thus SQL (and some other languages) may short-circuit, but only if it decides to and not necessarily in the order that you implicitly specified.
I am assuming here that you are asking about "automatic, implicit order-specific short-circuiting", which is what most developers expect from C,C++,C#,Java, etc. Both VB and SQL have ways to explicitly force order-specific short-circuiting. However, usually when people ask this question it's a "Do What I Meant" question; that is, they mean "why doesn't it Do What I Want?", as in, automatically short-circuit in the order that I wrote it.
One benefit I can think of is that some operations might have side-effects that you might expect to happen.
Example:
if (true || someBooleanFunctionWithSideEffect()) {
...
}
But that's typically frowned upon.
Ada does not do it by default. In order to force short-circuit evaluation, you have to use and then or or else instead of and or or.
The issue is that there are some circumstances where it actually slows things down. If the second condition is quick to calculate and the first condition is almost always true for "and" or false for "or", then the extra check-branch instruction is kind of a waste. However, I understand that with modern processors with branch predictors, this isn't so much the case. Another issue is that the compiler may happen to know that the second half is cheaper or likely to fail, and may want to reorder the check accordingly (which it couldn't do if short-circuit behavior is defined).
I've heard objections that it can lead to unexpected behavior of the code in the case where the second test has side effects. IMHO it is only "unexpected" if you don't know your language very well, but some will argue this.
In case you are interested in what actual language designers have to say about this issue, here's an excerpt from the Ada 83 (original language) Rationale:
The operands of a boolean expression
such as A and B can be evaluated in
any order. Depending on the complexity
of the term B, it may be more
efficient (on some but not all
machines) to evaluate B only when the
term A has the value TRUE. This
however is an optimization decision
taken by the compiler and it would be
incorrect to assume that this
optimization is always done. In other
situations we may want to express a
conjunction of conditions where each
condition should be evaluated (has
meaning) only if the previous
condition is satisfied. Both of these
things may be done with short-circuit
control forms ...
In Algol 60 one can achieve the effect
of short-circuit evaluation only by
use of conditional expressions, since
complete evaluation is performed
otherwise. This often leads to
constructs that are tedious to follow...
Several languages do not define how
boolean conditions are to be
evaluated. As a consequence programs
based on short-circuit evaluation will
not be portable. This clearly
illustrates the need to separate
boolean operators from short-circuit
control forms.
Look at my example at On SQL Server boolean operator short-circuit which shows why a certain access path in SQL is more efficient if boolean short circuit is not used. My blog example it shows how actually relying on boolean short-circuit can break your code if you assume short-circuit in SQL, but if you read the reasoning why is SQL evaluating the right hand side first, you'll see that is correct and this result in a much improved access path.
Bill has alluded to a valid reason not to use short-circuiting but to spell it in more detail: highly parallel architectures sometimes have problem with branching control paths.
Take NVIDIA’s CUDA architecture for example. The graphics chips use an SIMT architecture which means that the same code is executed on many parallel threads. However, this only works if all threads take the same conditional branch every time. If different threads take different code paths, evaluation is serialized – which means that the advantage of parallelization is lost, because some of the threads have to wait while others execute the alternative code branch.
Short-circuiting actually involves branching the code so short-circuit operations may be harmful on SIMT architectures like CUDA.
– But like Bill said, that’s a hardware consideration. As far as languages go, I’d answer your question with a resounding no: preventing short-circuiting does not make sense.
I'd say 99 times out of 100 I would prefer the short-circuiting operators for performance.
But there are two big reasons I've found where I won't use them.
(By the way, my examples are in C where && and || are short-circuiting and & and | are not.)
1.) When you want to call two or more functions in an if statement regardless of the value returned by the first.
if (isABC() || isXYZ()) // short-circuiting logical operator
//do stuff;
In that case isXYZ() is only called if isABC() returns false. But you may want isXYZ() to be called no matter what.
So instead you do this:
if (isABC() | isXYZ()) // non-short-circuiting bitwise operator
//do stuff;
2.) When you're performing boolean math with integers.
myNumber = i && 8; // short-circuiting logical operator
is not necessarily the same as:
myNumber = i & 8; // non-short-circuiting bitwise operator
In this situation you can actually get different results because the short-circuiting operator won't necessarily evaluate the entire expression. And that makes it basically useless for boolean math. So in this case I'd use the non-short-circuiting (bitwise) operators instead.
Like I was hinting at, these two scenarios really are rare for me. But you can see there are real programming reasons for both types of operators. And luckily most of the popular languages today have both. Even VB.NET has the AndAlso and OrElse short-circuiting operators. If a language today doesn't have both I'd say it's behind the times and really limits the programmer.
If you wanted the right hand side to be evaluated:
if( x < 13 | ++y > 10 )
printf("do something\n");
Perhaps you wanted y to be incremented whether or not x < 13. A good argument against doing this, however, is that creating conditions without side effects is usually better programming practice.
As a stretch:
If you wanted a language to be super secure (at the cost of awesomeness), you would remove short circuit eval. When something 'secure' takes a variable amount of time to happen, a Timing Attack could be used to mess with it. Short circuit eval results in things taking different times to execute, hence poking the hole for the attack. In this case, not even allowing short circuit eval would hopefully help write more secure algorithms (wrt timing attacks anyway).
The Ada programming language supported both boolean operators that did not short circuit (AND, OR), to allow a compiler to optimize and possibly parallelize the constructs, and operators with explicit request for short circuit (AND THEN, OR ELSE) when that's what the programmer desires. The downside to such a dual-pronged approach is to make the language a bit more complex (1000 design decisions taken in the same "let's do both!" vein will make a programming language a LOT more complex overall;-).
Not that I think this is what's going on in any language now, but it would be rather interesting to feed both sides of an operation to different threads. Most operands could be pre-determined not to interfere with each other, so they would be good candidates for handing off to different CPUs.
This kins of thing matters on highly parallel CPUs that tend to evaluate multiple branches and choose one.
Hey, it's a bit of a stretch but you asked "Why would a language"... not "Why does a language".
The language Lustre does not use short-circuit evaluation. In if-then-elses, both then and else branches are evaluated at each tick, and one is considered the result of the conditional depending on the evaluation of the condition.
The reason is that this language, and other synchronous dataflow languages, have a concise syntax to speak of the past. Each branch needs to be computed so that the past of each is available if it becomes necessary in future cycles. The language is supposed to be functional, so that wouldn't matter, but you may call C functions from it (and perhaps notice they are called more often than you thought).
In Lustre, writing the equivalent of
if (y <> 0) then 100/y else 100
is a typical beginner mistake. The division by zero is not avoided, because the expression 100/y is evaluated even on cycles when y=0.
Because short-circuiting can change the behavior of an application IE:
if(!SomeMethodThatChangesState() || !SomeOtherMethodThatChangesState())
I'd say it's valid for readability issues; if someone takes advantage of short circuit evaluation in a not fully obvious way, it can be hard for a maintainer to look at the same code and understand the logic.
If memory serves, erlang provides two constructs, standard and/or, then andalso/orelse . This clarifies intend that 'yes, I know this is short circuiting, and you should too', where as at other points the intent needs to be derived from code.
As an example, say a maintainer comes across these lines:
if(user.inDatabase() || user.insertInDatabase())
user.DoCoolStuff();
It takes a few seconds to recognize that the intent is "if the user isn't in the Database, insert him/her/it; if that works do cool stuff".
As others have pointed out, this is really only relevant when doing things with side effects.
I don't know about any performance issues, but one possible argumentation to avoid it (or at least excessive use of it) is that it may confuse other developers.
There are already great responses about the side-effect issue, but I didn't see anything about the performance aspect of the question.
If you do not allow short-circuit evaluation, the performance issue is that both sides must be evaluated even though it will not change the outcome. This is usually a non-issue, but may become relevant under one of these two circumstances:
The code is in an inner loop that is called very frequently
There is a high cost associated with evaluating the expressions (perhaps IO or an expensive computation)
The short-circuit evaluation automatically provides conditional evaluation of a part of the expression.
The main advantage is that it simplifies the expression.
The performance could be improved but you could also observe a penalty for very simple expressions.
Another consequence is that side effects of the evaluation of the expression could be affected.
In general, relying on side-effect is not a good practice, but in some specific context, it could be the preferred solution.
VB6 doesn't use short-circuit evaluation, I don't know if newer versions do, but I doubt it. I believe this is just because older versions didn't either, and because most of the people who used VB6 wouldn't expect that to happen, and it would lead to confusion.
This is just one of the things that made it extremely hard for me to get out of being a noob VB programmer who wrote spaghetti code, and get on with my journey to be a real programmer.
Many answers have talked about side-effects. Here's a Python example without side-effects in which (in my opinion) short-circuiting improves readability.
for i in range(len(myarray)):
if myarray[i]>5 or (i>0 and myarray[i-1]>5):
print "At index",i,"either arr[i] or arr[i-1] is big"
The short-circuit ensures we don't try to access myarray[-1], which would raise an exception since Python arrays start at 0. The code could of course be written without short-circuits, e.g.
for i in range(len(myarray)):
if myarray[i]<=5: continue
if i==0: continue
if myarray[i-1]<=5: continue
print "At index",i,...
but I think the short-circuit version is more readable.
I'm writing a compiler, and I'm looking for resources on optimization. I'm compiling to machine code, so anything at runtime is out of the question.
What I've been looking for lately is less code optimization and more semantic/high-level optimization. For example:
free(malloc(400)); // should be completely optimized away
Even if these functions were completely inlined, they could eventually call OS memory functions which can never be inlined. I'd love to be able to eliminate that statement completely without building special-case rules into the compiler (after all, malloc is just another function).
Another example:
string Parenthesize(string str) {
StringBuilder b; // similar to C#'s class of the same name
foreach(str : ["(", str, ")"])
b.Append(str);
return b.Render();
}
In this situation I'd love to be able to initialize b's capacity to str.Length + 2 (enough to exactly hold the result, without wasting memory).
To be completely honest, I have no idea where to begin in tackling this problem, so I was hoping for somewhere to get started. Has there been any work done in similar areas? Are there any compilers that have implemented anything like this in a general sense?
To do an optimization across 2 or more operations, you have to understand the
algebraic relationship of those two operations. If you view operations
in their problem domain, they often have such relationships.
Your free(malloc(400)) is possible because free and malloc are inverses
in the storage allocation domain.
Lots of operations have inverses and teaching the compiler that they are inverses,
and demonstrating that the results of one dataflow unconditionally into the other,
is what is needed. You have to make sure that your inverses really are inverses
and there isn't a surprise somewhere; a/x*x looks like just the value a,
but if x is zero you get a trap. If you don't care about the trap, it is an inverse;
if you do care about the trap then the optimization is more complex:
(if (x==0) then trap() else a)
which is still a good optimization if you think divide is expensive.
Other "algebraic" relationships are possible. For instance, there are
may idempotent operations: zeroing a variable (setting anything to the same
value repeatedly), etc. There are operations where one operand acts
like an identity element; X+0 ==> X for any 0. If X and 0 are matrices,
this is still true and a big time savings.
Other optimizations can occur when you can reason abstractly about what the code
is doing. "Abstract interpretation" is a set of techniques for reasoning about
values by classifying results into various interesting bins (e.g., this integer
is unknown, zero, negative, or positive). To do this you need to decide what
bins are helpful, and then compute the abstract value at each point. This is useful
when there are tests on categories (e.g., "if (x<0) { ... " and you know
abstractly that x is less than zero; you can them optimize away the conditional.
Another way is to define what a computation is doing symbolically, and simulate the computation to see the outcome. That is how you computed the effective size of the required buffer; you computed the buffer size symbolically before the loop started,
and simulated the effect of executing the loop for all iterations.
For this you need to be able to construct symbolic formulas
representing program properties, compose such formulas, and often simplify
such formulas when they get unusably complex (kinds of fades into the abstract
interpretation scheme). You also want such symbolic computation to take into
account the algebraic properties I described above. Tools that do this well are good at constructing formulas, and program transformation systems are often good foundations for this. One source-to-source program transformation system that can be used to do this
is the DMS Software Reengineering Toolkit.
What's hard is to decide which optimizations are worth doing, because you can end
of keeping track of vast amounts of stuff, which may not pay off. Computer cycles
are getting cheaper, and so it makes sense to track more properties of the code in the compiler.
The Broadway framework might be in the vein of what you're looking for. Papers on "source-to-source transformation" will probably also be enlightening.
How many lines of code (LOC) does it take to be considered a large project? How about for just one person writing it?
I know this metric is questionable, but there is a significant difference, for a single developer, between 1k and 10k LOC. I typically use space for readability, especially for SQL statements, and I try to reduce the amount of LOC for maintenance purpose to follow as many best practice as i can.
For example, I created a unified diff of the code I modified today, and it was over 1k LOC (including comments and blank lines). Is "modified LOC" a better metric? I have ~2k LOC, so it's surprising I modified 1k. I guess rewriting counts as both a deletion and addition which doubles the stats.
A slightly less useless metric - time of compilation.
If your project takes more than... say, 30 minutes to compile, it's large :)
Using Steve Yegge as the benchmark at the upper range of the scale, let's say that 500k lines of code is (over?) the maximum a single developer can maintain.
More seriously though; I think once you hit 100k LOC you are probably going to want to start looking for re-factorings before extensions to the code.
Note however that one way around this limit is obviously to compartmentalise the code more. If the sum-total of all code consists of two or three large libraries and an application, then combined this may well be more than you could maintain as a single code-base, but as long as each library is nicely self-contained you aren't going to exceed the capacity to understand each part of the solution.
Maybe another measurement for this would be the COCOMO measure - even though it is probably as useless as LOC.
A single developer could only do organic projects - "small" teams with "good" experience working with "less than rigid" requirements.
In this case, efford applied in man months are calculated as
2.4 * (kLOC)^1.05
This said, 1kLOC would need 2.52 man month. You can use several factors to refine that, based on product, hardware, personel, and project attributes.
But all we have done now is projected LOC to a time measurement. Here you again have to decide whether a 2-month or 20-month project is considered large.
But as you said, LOC probably is not the right measure to use. Keywords: software metrics, function points, evidence based scheduling, the planing game.
In my opinion it also depends on the design of your code - i've worked on projects in the 1-10K loc range, that was so poorly designed, that it felt like a really large project.
But is LOC really an interesting meassure for code? ;-)
While filling in The Object Oriented Concepts Survey (To provide some academic researchers with real-life data on software design), I came upon this question:
What is the limit N of maximum methods you allow in your classes?
The survey then goes on asking if you refactor your classes once you reach this limit N.
I've honestly never thought about such a limit while designing my applications and wonder what the reasoning behind this is. Why would I want to self-impose myself an arbitrary number which probably is very dependent on the classes functionality?
You don't have to limit N of maximum. But you have to follow 'High Cohesion' principe. And don't create all-can-do-whatever-it-is classes.
I suppose there is some N after which you should start worrying. But it is really depends on the class itself and its primary goal.
The idea that there's a magic number that we can base a Rule on is the usual squeamishness from those whose desire to impose order on the universe outweighs their sense.
That said, if you have more than 20 or so methods in a class, there's a good chance it's doing too much and violating the SRP.
I wouldn't put an arbitrary limit on things either, but I would say that once a class has more than somewhere in the 10-20 public methods range I'd take a serious look at what that class is doing. Back in my J2EE days, we called them Enterprise Java Melons.
Same rule applies for the length of individual methods. I've seen classes that had only one or two methods, but each of those methods was hundreds of lines of code.
Since I started breaking classes down to a single responsibility, I don't usually approach a place where it gets questionable.
Also, a well-designed class may have 30 methods, and a poorly designed one may have 3 (Umm, 30 is pushing it, but the point is--this isn't necessarily even a good metric, kind of like counting kloc)
Your framework / language can necessitate a lot of methods without business logic too.
Counting the number of non-trivial methods with business logic in them might be interesting--I'd say around 4 or 5 would be appropriate.
I was surprised how many methods the JDK classes actually have in them when I was looking at the source code, but they are so well broken, so small and so easily read that it wasn't a problem at all to have 20.
Like others have pointed out there generally isn't some arbitrary number of methods at which point I'll say "That's too many methods!" Sometimes the opposite can be just as bad, such as when an object has a monolithic do-everything method that spans hundreds of lines.
That being said, if I open up a source file I haven't looked at before and see more than 10-20 methods I will probably scan through it to see if it can't be re-factored in some way.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How many parameters are too many?
I was just writing a function that took in several values and it got me thinking. When is the number number of arguments to a function / method too many? When (if) does it signal a flawed design? Do you design / refactor the function to take in structs, arrays, pointers, etc to decrease the amount of arguments? Do you refactor the data coming in just to decrease the number of arguments? It seems that this could be a little less applicable in OOP designs, though. Just curious to see how others view the issue.
EDIT: For reference the function I just wrote took in 5 parameters. I use the definition of several that my AP Econ teacher gave me. More than 2; less than 7.
I don't know, but I know it when I see it.
According to Steve McConnell in Code Complete, you should
Limit the number of a routine's
parameters to about seven
If you have to ask then that's probably too many.
I generally believe that if the parameters are functionally related (e.g., coordinates or color components), they should be encapsulated as a class for good measures.
Not that I always follow this myself ;)
Robert C. Martin (Uncle Bob) recommends 3 as a maximum in Clean Code: A Handbook of Agile Software Craftsmanship
I don't have the book with me at the moment but his reasoning has to do with one, two and, to a lesser extent, three argument functions reading well and clearly showing the purpose of the function.
This of course goes hand in hand with his recommendation of very short, well named functions that adhere to the Single Responsibility Principal.
Quick answer: When you have to stop and ask that question, you've got too many.
Personally I like to keep the number under six. If more is needed, then the solution depends on the problem. One approach is to use "setter" functions to give the values to an object that will eventually perform the function you desire. Another option is to use a struct, as you mentioned. Either way, you can't really go wrong.
Well it would most certainly depend on what your function is doing as far as how many would be considered "too many". Having said that, it is certainly possible to have a function with a lot of different parameters that are options on how to handle certain cases inside the function, and having overloads to those functions with sane default values for those options.
With the pervasiveness of Intellisense (or equivalent in other IDEs) and tooltips showing the comments from the XML Documentation in Visual Studio, I don't really think that there's a firm answer to this question.
Too much parameter is a "Code Smell".
You can divide into multiple methods or use class to regroup variable that have something in common.
To put a number for the "Too much" is something very subjective and depend of your organization and the language you use, A rule of thumb is that if you can't read the signature of your method and have an idea of what is it doing than you might have too much information. Personnaly, I try not to go over 5 parameters.
For me is 5.
It is hard to manage ( remember name, order, etc ) beyond that. Plus If I come that far I have versions with default values that call this one.
Depends on the Function as well, if your function requires heavy user intervention or variables, I wouldn't go past 7-8 range. As far as average number of parameters to go with, 5-6 is the sweet spot in my opinion. If you are using more than that you might want to consider class objects as parameters or other smaller functions.
It varies from person to person. Personally, when I have trouble immediately understanding what a function call is doing by reading the invocation in code, it is time to refactor to take the strain off of my gray cells.
I've heard that 7 figure as well, but I somehow feel that it stems from a time when all you could pass where primitive values.
Nowadays you can pass a reference to an object that encapsulates some complex state (and behaviour). Using 7 of those would definitely be too much.
My personal goal is to avoid using more than 4.
It depends strongly on the types of the arguments. If they are all integers then 2 can be too many. (how do I remember which order?) If any argument accepts null, then the number drops drastically.
The real answer comes from asking yourself:
how easy is it to understand calls when I'm reading code?
how easy is it to remember the correct arguments and argument order when writing code?
And it depends of the programming language.. In C, it's really not rare to see functions with 7 parameters.. However, in C#, I have rarely seen more than 5 parameters and I personally use less than 3 usually.
// In C
draw_dot(x, y, size, red, green, blue, alpha)
// In C#
Point point(x,y);
Color color(red,green,blue,alpha);
Tool.DrawDot(point, color);
I would say maximum 4 . Anything above , I think should be placed within a class .